This was the fourth time I've attended the Conference on Semantics in Healthcare and Life Sciences (CSHALS), and every time I come back with new ideas. This conference has a much greater emphasis on implementation than in the past. Considering that this conference has been going for seven years, that means a very clear evolution from its more speculative origins. Organized by the International Society for Computational Biology (ISCB), it's perhaps the best blend I've seen of people from industry and academia centered around applying semantic technologies and strategies to biomedical research.
Callimachus: Content Management for Your Data
The tutorial for Callimachus gave me a pleasant surprise. Callimachus is an open source content management system that's driven by semantic web standards and Linked Data that is "fanatically web-compliant". Written in Java, it is available in Open Source and Enterprise editions. It supports the usual CMS capabilities, like authoring documents in HTML or docbook (which comes with a WYSISYG editor), but it excels at letting you build Rails-like applications by simply writing XHTML templates for creating, viewing, and editing instances of classes. Since the underlying RDF model is flexible, the templates themselves determine what properties are added and shown on the instances, and conventional XHTML and docbook pages make it easy to fill in the gaps. The user interface also makes it easy to either create these templates (as well as Javascript and CSS) from within the deployed application or to import and export an application using a Callimachus ARchive (CAR) file.

From the Callimachus Composite RDF-XHTML Tutorial.
RDF is a Competitive Differentiator
Ron Collette, the CIO of Foundation Medicine, gave a keynote at CSHALS that highlighted the competitive advantages of semantic technologies. Pointing to the flexibility and integration capabilities of SPARQL, Ron suggested that it is the only serious option for federating queries across enterprise databases. It also makes it possible to effectively represent and query data with "extreme cardinality", that is, data that contains lots of many-to-many relationships (which are expressed as a single statement, rather than having to create a special table for it), data where one-to-many relationships scale to tens of thousands of instances on the "many" side, and deep transitive relationships among lots of entities. Conventional SQL databases simply don't perform on data like this, and NoSQL databases are also challenged by this problem.
#CSHALS keynote Ron Collette: when we need to federate queries across enterprise databases, the only serious option is SPARQL.
— Jim McCusker (@jpmccu) February 27, 2014
The Clinical Data Interchange Standards Consortium (CDISC) is also relying on semantic technologies for interoperability. As part of this effort, their use of RDF in clinical documents means that development can truly happen incrementally, because, by developing a model, they are able to actually implement it in-place. Systems like Callimachus are able to ingest those models to quickly create semantic applications from them.
At #CSHALS Frederick Malfait: RDF allows for incremental data development, because the model IS the implementation.
— Jim McCusker (@jpmccu) February 27, 2014
A recent case study also showed that ROI from use of semantic technologies at Hoffmann-LaRoche will save $60M over the next three years by allowing businesses to respond faster to informatics challenges.
#CSHALS Hoffmann-LaRoche Case Study Overview Calculated ROI – will save 60M in next 3 years using Semantic Technologies
— Joanne Luciano (@JoanneLuciano) February 27, 2014
Graph Databases Compete Hard
The graph database vendors themselves also made a big splash at CSHALS in the tech talks. Systap has been developing an RDF database called BigData (named before the more common use of the term) that provides horizontal scaling within cloud environments. I used a single instance of their server for the poster I mentioned in my previous post about big data analytics. They are also working on a GPU-based graph query system that looks exciting. Bryan Thompson, one of Systap's principals, made an interesting point about SPARQL and RDF standards, that, because RDF databases all rely on the same underlying model and open standards, it is much easier for customers to do competitive analyses. This means that, for instance, query optimization and performance become differentiators, whereas similar NoSQL offerings, such as Neo4j, haven't tried to improve performance because comparisons between databases are harder due to their diversity.
Bryan Thompson: Bigdata RDF store outperforms graph databases, as do other really good RDF triple stores. #cshals
— Hilmar Lapp (@hlapp) February 27, 2014
Another database vendor that presented was YarcData, a subsidiary of Cray. They presented their Ureka graph supercomputer, which is a scalable supercomputer able to do in-memory queries of billions of RDF statements. This is an exciting prospect for those of us who would like to perform deep graph analytics on how biological entities interact.
Nanopublications Enable Big Data Science
Finally, nanopublications are starting to make a big impact. Barend Mons gave a keynote that revolved around producing and using knowledge expressed as nanopublications, and Deborah McGuinness did the same in her keynote with my related work (she is my PhD advisor) and how Rensselaer is extending this approach to help patients participate in their own treatments. Open PHACTS is an interesting case study because it is a collaboration between pharmaceutical companies and researchers to build a common knowledge base that everyone can build their research off of, and it is based on the nanopublication framework.
Overall, the transition this year from theoretical and proof-of-concept systems to systems that provide real competitive advantages has been great to see. Please let us know if we can help you with understanding how semantic technologies can help your informatics strategy.
Did you like this article? You may also enjoy our free eBook on Advancing Biorepositories with Data Science. Biobanking has seen many changes over the past decade. Decentralized biobanks managed by spreadsheet have given way to institution-wide efforts that are managed through large scale information systems. This is being driven through the adoption of data standards in these information systems. Learn more in the ebook below.