This is the first post in our data modeling series. Today we give a broad perspective for different ways to represent knowledge and data. Some of our posts have talked about ontologies, controlled vocabularies, data models, and other kinds of knowledge representation. All of these share some commonalities, and exist along the Ontology (or sometimes Semantic) Spectrum.
This spectrum was first put in print by Deborah McGuinness in the paper “Ontologies Come of Age” and was developed by her and other panelists at the 1999 AAAI Ontologies Panel. This continuum is defined by its increasing levels of formality, or the amount of additional information that can be inferred from the base knowledge. A quick overview of the differences between the different kinds of knowledge representations along the spectrum should help Batman and Robin resolve their communication issues.
Many of the older ontology spectrum diagrams talk more about the ingredients needed for a position along the ontology spectrum, but here I focus on what kinds of ontologies exist along that spectrum.
A catalog, sometimes called a data dictionary, is a basic list of data elements with their permissible values. Usually these elements are locally defined and may or may not include definitions. Data dictionaries are often released with self-contained databases or datasets, andare usually formatted for human consumption in a text, PDF, or HTML document. Tagging systems also fall into this category, since they rely on user-defined tags that can be re-used, but are not often clearly defined by themselves.
Glossaries are somewhat more structured, containing identifiers for each term and definitions. Glossaries that aim to be interoperable will use URIs or URLs to identify their terms, but this is unusual. The key difference between a catalog and a glossary is the use of definitions for the terms that it contains.
Sometimes called a thesaurus, controlled vocabularies have what is called a weak is-a relationship, sometimes called a broader/narrower relationship. These thesauri relate concepts to each other using predefined relationships, and define a number of terms (labels) that are synonyms for the same concept. The Simple Knowledge Organization System (SKOS) is a well-defined standard that allows the definition of concepts that can have broader, narrower, related, exact match, and close match relationships with each other. We discussed a number of biomedical controlled vocabularies in our Controlled Vocabulary post.
A taxonomy provides much stronger is-a relationships between entities, and generally talks about actual categories of things. The classic taxonomy, of course, is the biological taxonomy, a hierarchy of classes of organisms, both alive and extinct. In a taxonomy, all members of a class are also members of any superclasses of that class. For instance, all Homo sapiens are mammals. This is expressed in the biological taxonomy by saying that Homo sapiens is-a mammal. In our comic, I'm sure Batman is correcting Robin because Robin created a class hierarchy that allows for classification, but does not include any additional information, such as what attributes those classes might have.
Sometimes called a Frame Data Model or a schema, this is the addition of which properties (attributes and relations) are used by which classes. Classic examples of data models include conventional object models from object-oriented programming languages like Java and C++. In the semantic web, these sorts of models are usually represented using RDF (Resource Description Framework) Schemas, or RDFS, but can sometimes include a subset of the Web Ontology Language (OWL) called RDFS+.
Some ontologies take the data model concept further with additional possibilities for logical implications in the form of formal constraints. With formal constraints, classes can be disjoint from each other. Instances, classes, and properties can be declared to be identical to each other, so that if we agree that myNameProperty is identical to yourNameProperty, we can substitute one for the other automatically. This sort of equivalence, along with the introduction of property restrictions (saying that members of a class must have a value for a property with a particular type, value, or cardinality), means that instances can be classified, or assigned to additional classes based on existing knowledge about them.
Some ontology languages provide a means to write additional logical rules that further extend what can be said in ontologies. Some rule languages, like the Semantic Web Rule Language (SWRL), only allow rules that are guaranteed to let the reasoner finish some day (or are decidable, for those of you who took computability). Others, like Common Logic, Datalog, and Prolog, allow for arbitrary rules that are much more free-form, supporting the complete first-order (and sometimes other higher-order) logic systems.
What should I use?
This will depend on what you are attempting to do with your knowledge system. If you need to provide subject tags for documents, a controlled vocabulary will work. If you need to create some data that other people can understand, a data model may work well, if it is well documented and easily extensible. If you need to ground your data in existing knowledge about a particular subject, ontologies may provide the greatest benefit.
It is important, though, as you look at using different ontologies for your own use that existing ontologies have settled themselves somewhere along this spectrum. Their position along this spectrum makes them more or less suited to the use that you have in mind for them - it would not be a good idea to, for instance, build an object model directly from the concepts defined in a controlled vocabulary, or to even create instances of those concepts. Similarly, data models and formally constrained ontologies may not have the vocabulary necessary to perform subject tagging. The value of the ontology is in its use, and pairing the ontology to an appropriate use in your project will help determine its success.