A Biobank refers to any organized collection of biological material that once was either part of a living organism or produced by it. While this blog post focuses on the human biospecimen repositories, the fundamental principles discussed are relevant for many if not most other biobank types. Furthermore the terms specimen, biospecimen and sample are used interchangeably.
Biospecimen have long been a key asset in evidence-based medicine. In fact many ‘lab values’ in modern healthcare are derived from blood, urine and other biological samples. One can argue that whole medical disciplines, such as oncology, are founded on biospecimen-based diagnoses. Beyond their significance in patient-care, biospecimens also play a pivotal role in biomedical research. They help us understand disease mechanisms and develop of new molecular diagnostics and therapeutics.
The value of biobanks is not just determined by the quality of the banked specimen but also by the quality, richness, and representation of the information associated with these specimens.
Today, the samples are collected for tomorrow, therefore, improvement is needed now in standardization, automated enrichment of annotations from hospital information systems and disease registries, insight in overlapping collections of different forms of tissue banking and cooperation in national and international networks.
1. There is a Standards Deficit
There are billions of specimens held in biobanks worldwide. However their utility has been hampered by the absence of globally accepted biobanking standards, especially when it comes to appropriate biospecimen annotation.
Standardizations of sample quality, form, and analysis are an important unmet need and requirement for gaining the full benefit from collected samples. Coupled to this standard is the provision of annotation describing clinical status and metadata of measurements of clinical phenotype that characterizes the sample. Today we have not yet achieved consensus on how to collect, manage, and build biobank archives in order to reach goals where these efforts are translated into value for the patient.
The need for standardized annotations is driven by the expanding scope of biospecimen dependent research. Research studies require not only high-quality samples, but also adequate numbers of samples. Sample cohorts are often defined by very specific characteristics and biobanks need to enable the discovery of relevant specimens via a consistent and searchable set of parameters. Precision medicine requires more and more specific specimen subtypes in order to detect the molecular characteristics of disease at an increasingly personalized level. Getting sufficient sample sizes for this requires pooling specimens either from multiple biobanks or collection of specimen across multiple institution or hospitals into a large central biobank. Unless biorepositories are using a common standard for describing their specimens, it becomes difficult to know what is available without consulting each biorepository separately.
2. Standards are Interoperability-Driven
In recent years federated biobanking has become a viable model to increase accessibility of biospecimens across many localized collections. Federated biobanks operate by decoupling basic biorepository logistics, the LIMS aspect (Laboratory Information Management System), from presenting biospecimen to researchers for sample cohort discovery. Multiple biorepositories share a common online database, aka “Virtual Biobank”. 5AM Solutions has developed an open source software product called Biolocator to support virtual biobanks.
Data standards provide descriptions of the structure of exchanged information such as entity names, data element names, descriptions, definitions and formatting rules. When used for sample annotation, the standards enable interoperability among federated biobanks by guaranteeing that the interacting parties share the same understanding of the shared biospecimen. They also facilitate data exchange with systems that manage clinical data, for example, EHR (Electronic Health Record) and CDMS (Clinical Data Management System) systems. Data standards are also essential to integrate biorepositories with LIMS solutions and molecular data repositories that manage biospecimen-derived data.
3. Data Standards are Underused In Research Biobanking
Most small scoped biobanks in the research realm use no standards at all when annotating their assets. Even larger biobanking operations map their biospecimen annotations only to few selected controlled vocabularies, which we will discuss later. Key drivers for these selections are interoperability concerns with other departments, institutions, or systems. The larger the biobanking context the more likely it is that standard vocabularies will be used to map biobanking annotations across biospecimens managed by all participating parties. It is also important to note that current vocabularies used in biobanking are not perfect, but instead provide a starting point for systematic biospecimen annotations. Frequently, local extensions are required to accommodate all shared concepts, as the standard vocabularies are not always sufficient to accommodate all information contained in modern biospecimen annotations. This is true in particular for many preanalytic parameters, e.g. factors and conditions that influence biospecimen properties and characteristics before analysis.
4. Data Standards Are Recommended By OBBR and ISBER
Both the Office of Biorepositories and Biospecimen Research (OBBR), and the International Society for Biological and Environmental Repositories (ISBER) explicitly recommend the use of Common Data Elements (CDEs) and controlled vocabularies for biospecimen annotation to ensure system interoperability and maximum research reuse.
Data should be electronically convertible into formats that can easily be shared among collaborating institutions, where possible and appropriate. The inventory management system should enforce all data integrity, security and audit trail requirements for external access. To achieve interoperability, inventory management systems should do the following:
- Have a public documented Application Programming Interface (API) to enable other systems to integrate with it.
- Use common public vocabularies for relevant data points (e.g., SNOMED, ICD9-CM, ICD10, ICDO).
Biospecimen resources should employ a uniform, nonredundant vocabulary (e.g., Cancer Biomedical Informatics Grid [caBIG®] common data elements [CDEs]) for clinical data.
While the ISBER guidelines are a good start they lack detail and specificity to truly enable biobanking interoperability. The OBBR recommendation is more precise but references the caBIG(R) standard that, for various reasons, failed to gain widespread adoption in production biobanking solutions. Among them were tight coupling to the caTissue application, lack of sufficient funding to implement (and improve) the standard in the existing biobanking systems.
Next: In our next post, we will discuss the use of controlled vocabularies in biobanking and how that basic form of data standards help interoperability and research.