Up at 5AM: The 5AM Solutions Blog

Highlights from the Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Posted on Wed, Mar 05, 2014 @ 10:04 AM

This was the fourth time I've attended the Conference on Semantics in Healthcare and Life Sciences (CSHALS), and every time I come back with new ideas. This conference has a much greater emphasis on implementation than in the past. Considering that this conference has been going for seven years, that means a very clear evolution from its more speculative origins. Organized by the International Society for Computational Biology (ISCB), it's perhaps the best blend I've seen of people from industry and academia centered around applying semantic technologies and strategies to biomedical research.


Linked Data



Callimachus: Content Management for Your Data

The tutorial for Callimachus gave me a pleasant surprise. Callimachus is an open source content management system that's driven by semantic web standards and Linked Data that is "fanatically web-compliant". Written in Java, it is available in Open Source and Enterprise editions. It supports the usual CMS capabilities, like authoring documents in HTML or docbook (which comes with a WYSISYG editor), but it excels at letting you build Rails-like applications by simply writing XHTML templates for creating, viewing, and editing instances of classes. Since the underlying RDF model is flexible, the templates themselves determine what properties are added and shown on the instances, and conventional XHTML and docbook pages make it easy to fill in the gaps. The user interface also makes it easy to either create these templates (as well as Javascript and CSS) from within the deployed application or to import and export an application using a Callimachus ARchive (CAR) file.


From the Callimachus Composite RDF-XHTML Tutorial.

RDF is a Competitive Differentiator

Ron Collette, the CIO of Foundation Medicine, gave a keynote at CSHALS that highlighted the competitive advantages of semantic technologies. Pointing to the flexibility and integration capabilities of SPARQL, Ron suggested that it is the only serious option for federating queries across enterprise databases. It also makes it possible to effectively represent and query data with "extreme cardinality", that is, data that contains lots of many-to-many relationships (which are expressed as a single statement, rather than having to create a special table for it), data where one-to-many relationships scale to tens of thousands of instances on the "many" side, and deep transitive relationships among lots of entities. Conventional SQL databases simply don't perform on data like this, and NoSQL databases are also challenged by this problem.


The Clinical Data Interchange Standards Consortium (CDISC) is also relying on semantic technologies for interoperability. As part of this effort, their use of RDF in clinical documents means that development can truly happen incrementally, because, by developing a model, they are able to actually implement it in-place. Systems like Callimachus are able to ingest those models to quickly create semantic applications from them.

A recent case study also showed that ROI from use of semantic technologies at Hoffmann-LaRoche will save $60M over the next three years by allowing businesses to respond faster to informatics challenges.


Graph Databases Compete Hard

The graph database vendors themselves also made a big splash at CSHALS in the tech talks. Systap has been developing an RDF database called BigData (named before the more common use of the term) that provides horizontal scaling within cloud environments. I used a single instance of their server for the poster I mentioned in my previous post about big data analytics. They are also working on a GPU-based graph query system that looks exciting. Bryan Thompson, one of Systap's principals, made an interesting point about SPARQL and RDF standards, that, because RDF databases all rely on the same underlying model and open standards, it is much easier for customers to do competitive analyses. This means that, for instance, query optimization and performance become differentiators, whereas similar NoSQL offerings, such as Neo4j, haven't tried to improve performance because comparisons between databases are harder due to their diversity.


Another database vendor that presented was YarcData, a subsidiary of Cray. They presented their Ureka graph supercomputer, which is a scalable supercomputer able to do in-memory queries of billions of RDF statements. This is an exciting prospect for those of us who would like to perform deep graph analytics on how biological entities interact.

Nanopublications Enable Big Data Science

Finally, nanopublications are starting to make a big impact. Barend Mons gave a keynote that revolved around producing and using knowledge expressed as nanopublications, and Deborah McGuinness did the same in her keynote with my related work (she is my PhD advisor) and how Rensselaer is extending this approach to help patients participate in their own treatments. Open PHACTS is an interesting case study because it is a collaboration between pharmaceutical companies and researchers to build a common knowledge base that everyone can build their research off of, and it is based on the nanopublication framework.

Overall, the transition this year from theoretical and proof-of-concept systems to systems that provide real competitive advantages has been great to see. Please let us know if we can help you with understanding how semantic technologies can help your informatics strategy.


- Jim McCusker
Jim McCusker






Did you like this article? You may also enjoy our free eBook on Advancing Biorepositories with Data Science. Biobanking has seen many changes over the past decade. Decentralized biobanks managed by spreadsheet have given way to institution-wide efforts that are managed through large scale information systems. This is being driven through the adoption of data standards in these information systems. Learn more in the ebook below. 

Free eBook - Advancing Biorepositories with Data Science




Tags: Data Science, semantics, SPARQL, CSHALS


Diagnostic Tests on the Map of Biomedicine


Download the ebook based on our popular blog series. This free, 50+ page edition features updated, expanded posts and redesigned, easier-to-read maps. 

FREE Biobanking Ebook

Biobanking Free Ebook
Get this 29 page PDF document on how data science can be used to advance biorepositories.

 Free NGS Whitepaper

NGS White Paper for Molecular Diagnostics

Learn about the applications, opportunities and challenges in this updated free white paper. 

Recent Posts