Up at 5AM: The 5AM Solutions Blog

3 Languages Your Biobank Can Speak - Exploring Controlled Vocabularies

Posted on Tue, Oct 29, 2013 @ 06:00 AM

Previously, we discussed the mandate for using data standards in biobanking. Here we will show how biomedical controlled vocabularies, a kind of consensus data standard, are used to improve data quality and interoperability. Data interoperability standards, eg HL7 and the the ISO 11179 Common Data Element (CDE) standard make use of controlled vocabularies.


book index tabs


Controlled vocabularies, sometimes simply expressed as “vocabularies”, are tools used to standardize information for purposes of capturing, storing, exchanging, searching, and analyzing data. A controlled vocabulary is a restricted list of words or terms used for labeling, indexing or categorizing. It is controlled because only terms from the list may be used for the subject area covered by the controlled vocabulary.

Some of the most well known standard vocabularies created for healthcare are the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), the Logical Observation Identifiers Names and Codes (LOINC®), and the Unified Medical Language System (UMLS®). We will discuss each standard and what they are used for.


The Systematized Nomenclature of Medicine Clinical Terms, or SNOMED-CT, is a general-purpose vocabulary for the medical domain and claims to be “the most comprehensive, multilingual clinical healthcare terminology in the world”. It contains concepts in multiple languages about the entire medical domain and provides a hierarchy of those concepts. It also provides multiple terms per concept, allowing lookup of terms with greater or lesser formality. However, the hierarchy is not as rigorous as it can be. For instance, in the top level classes, “organism” is a separate top-level class from “physical object”, so this hierarchy should be taken with a grain of salt, rather than used as a firm class hierarchy. SNOMED-CT is therefore useful as a way to align words (terms) from multiple languages to concepts (language-independent, or translatable ideas), which can be used to standardize the annotations or data values of data models like HL7.



The next standard, Logical Observation Identifiers Names and Codes (LOINC®), is more specialized, as it aims to be a “universal code system for identifying laboratory and clinical observations” rather than being a comprehensive vocabulary for the health domain. LOINC is a taxonomy like SNOMED-CT, but is more organized and has nearly-sensible top-level concepts, possibly benefiting from it’s narrower scope. While we haven’t taken a rigorous review of the hierarchy, it seems that it is more likely to be usable as a basis for a class hierarchy. However, the top-level concepts seem to be categories of classes, rather than themselves being classes. This is sometimes appropriate for conceptual hierarchies, but bad modeling in class hierarchies. LOINC is also used in HL7 and other data standards.


The Unified Medical Language System (UMLS®) is a “meta vocabulary” published by the National Library of Medicine, providing mappings between SNOMED-CT, LOINC, and many other vocabularies. The UMLS team describes it as “a set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems.” It is not itself a controlled vocabulary, but is a useful means to find vocabularies appropriate vocabularies for particular applications, and to translate concepts from one controlled vocabulary to others.


Other Vocabularies

There are many other controlled vocabularies used for interoperability standards, including the National Cancer Institute Thesaurus (NCI thesaurus), used in cancer research, the International Classification of Disease (ICD), used to describe diseases, and the Common Procedure Terminology (CPT), which is used to describe clinical procedures. These vocabularies, and many more, including many ontologies, have been documented and  at the National Center for Biomedical Ontology’s (NCBO) Bioportal, which also provides tools and APIs for suggesting vocabularies and individual concepts to use as well as tools for authoring new vocabularies and ontologies.



Stay tuned for next week's post where we will explore data standard initiatives for biobanking. In the mean time feel free learn how you can import publish and share your biospecimens on the web using biolocator tool.


 Take the biolocator Tour



Jim McCusker (LinkedinTwitterBlog)

Jim McCusker


Roland Hannes Niedner (LinkedinTwitterInstagram)

Roland Hannes Niedner


Get great tips from us - Subscribe to our blog
Get Our Blog



Tags: biobanking, biospecimen, SNOMED_CT, Vocabularies


Diagnostic Tests on the Map of Biomedicine


Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts