Last week we discussed three controlled vocabularies that are used in biobanking. This week, we will discuss some data standards that use those vocabularies to make shared data models that can be used to shape and share biobanking data.
CAP Standard For Clinical Pathology
Biospecimen banked by clinical pathology are managed in systems ancillary to electronic medical records and are therefore not unsurprisingly managed by document-based solutions that generate and store pathology reports.
“Content and structure of clinical pathology reports are somewhat standardized by templates and guidelines published by the College of American Pathologists (CAP). For example in 2013 CAP produced 46 cancer protocols as a resource to pathologists to aid in effectively reporting surgical pathology findings necessary to provide quality patient care.”
Though just like in their EHR counterparts the structured information in these systems is limited to administrative and billing concerns while pathology and clinical data are part of the narrative often with little use of consistent terminology from CVs. Similar to electronic medical records these pathology reports have to be processed by either sophisticated natural language processing (NLP) and or human curators before they can be used for structured biospecimen annotation. This also apparent in the cancer protocols cited above. SNOMED CT was originally developed by CAP and is used in the protocol vocabularies. However no concept codes are referenced in the PDF or Microsoft Word documents destined solely for human consumption.
NCI Data Standards Initiatives
The National Cancer Institute (NCI) funded two efforts related to facilitate data standards in Biobanking, EDRN and caBIG®. The Early Detection Research Network (EDRN) was initiated in 1998 to improve methods for detecting the signatures of cancer cells. The cancer Biomedical Informatics Grid (caBIG®) intended to link researchers, physicians, and patients throughout the cancer community and was introduced in 2004. caBIG® ended in 2011 due to an unreasonable focus on the development of ”overly complex and ambitious software enterprise of NCI-branded tools...”, that had only “...limited traction in the cancer community”, as described by the caBIG Board of Scientific Advisors Ad Hoc Working Group.
It is of relevance here that caBIG® also developed the Common Biorepository Model (CBM) to reduce the time and effort required by researchers to locate a biobank that has the specimens they need. The goal of the CBM is to selectively share key information to enable a single search across multiple biobanks. The CBM supports the idea that data should fit a standardized simple domain model as a means to promote sharing. Even though caBIG® has ended the standard has been implemented by a number of commercial vendors and is also used by the NCI Specimen Resource Locator, a database that helps researchers locate human specimens for cancer research. The 5AM Biolocator also supports the CBM.
Table 1: CBM top-level concepts
EDRN is focusses on enabling biomarker detection and requires that specimens be collected, processed, and annotated in a standardized manner and that a set of common data elements (CDEs) be collected with each specimen. CDEs were also an integral part of the caBIG® interoperability framework and are essentially formalized descriptions of a piece of information. This description contains a name and an exact definition of the specific meaning (semantics – concept mapping) and representation (syntax – data type and format) of this information
BRISQ And SPREC
There have been a number of efforts to develop and introduce specific standards for biobanking. While the standard controlled vocabularies above are more centered in the clinical realm, standards like SPREC and BRISQ concentrate on the formal description of preanalytical parameters. These are parameters like sample collection, processing, and storage conditions that can significantly alter the biospecimens’ molecular composition and consistency. Such preanalytical factors can, in turn, influence experimental outcomes and the ability to reproduce scientific results.
Standard PREanalytical Code (SPREC) was developed by the International Society for Biological and Environmental Repositories (ISBER). It identifies the main preanalytical factors of clinical fluid and solid biospecimens and their simple derivatives in a “specimen barcode”. SPREC was introduced in 2010 and is intended to serve as a code that will become recognized internationally within the clinical biobanking sector.
The Biospecimen Reporting for Improved Study Quality (BRISQ) arose from a workshop, Development of Biospecimen Reporting Criteria for Publications, held at the 2009 NCI Biospecimen Research Network Symposium to initiate a discussion on biospecimen reporting recommendations. The list of recommended data elements discussed include general information for consistent documentation of classes of biospecimens and factors that might influence the integrity, quality, and/or molecular composition of biospecimens.
“The purpose of reporting these details is to supply others, from researchers to regulatory agencies, with more consistent and standardized information to better evaluate, interpret, compare, and reproduce the experimental results.”
“It is hoped that consideration of the BRISQ recommendations will sensitize the biobanking and research communities and their funding agencies to the importance of tracking preanalytical variables, leading to more judicious selection and handling of experimental human specimens and thus improved study quality.”
-- Biospecimen Reporting for Improved Study Quality (BRISQ)
Conclusion
Despite significant progress in the formulation of biobanking standards, including biospecimen annotations and reporting guidelines, current biospecimen dependent research still suffers from the widespread lack of adoption of these standards by the biobanking community at large.
“… lack of international harmonization, uneven adoption, and insufficient oversight of best practices are preventing further improvements in biospecimen quality and coordination among collaborators and biobanking networks.”
-- The Evolution of Biobanking Best Practices
“The lack of data-standards-driven biospecimen annotations essentially secludes millions of valuable samples from an increasingly global biobanking market. According to an August 2012 Infiniti Research report titled “Global Biobanking Market 2011-2015,” the biobanking market will increase 30% from 2011 to 2015 to nearly $183 billion. “
-- Global Biobanking Market 2011-2015
It is time that biobanking administrators, especially in academic medical centers, adopt a more long-term vision when deciding how to manage their biological sample collections. While better sample annotations will require greater investments in infrastructure, logistics and personnel, there will be a significant return on investment in both the economic and scientific sense. Precision medicine, as the name suggests, requires precise information about biospecimen donors, sample collection, processing and storage conditions and systematic capture of sample composition and pathology. Data standards applied to sample annotations at the source of the respective information is a vital component to maximizing the value of biospecimen for biomedical research and translational medicine.
-Are you looking to publish and share your biospecimens on the web? Learn how here.
Roland Hannes Niedner (Linkedin, Twitter, Instagram)
Jim McCusker (Linkedin, Twitter, Blog)