The Blood-Brain Barrier

This blog has nothing to do with physiology. The reason for the title is that’s it’s a term I stole from the medical world to describe the relationships between an ontology and data that conforms to that ontology. The medical term describes an osmotic membrane around the brain that only allows particles of a certain size to pass through – it’s a major challenge for drug treatments and one of the reasons brain tumours are hard to treat. But here, I want to discuss what can and should pass between an ontology and datasets. And when I say “pass between” I really mean “relate”, so all in all, a lousy analogy.

I’m probably already in trouble for making a distinction between the ontology and the data, and yes if I was being stricter, the data is also part of the ontology. Most people don’t think that way though – especially if they’ve come from a traditional data modelling background where you have data and data models. Occasionally you also have reference data, which is just a cheat’s way of extending a data model without having to go scorched-earth on running systems. However, even with formal ontologies you have to put them out there for people to use. When you publish them, you are effectively creating one side of the barrier. If other parties then want to use your ontology to give their data some formal structure, they’re creating the other side of the barrier. So even with super-flexible ontologies, if anyone is going to use them, there’s going to be some kind of “blood-brain barrier”.

So, what can cross the barrier ? The most typical case is that only “type-instance” relationships can cross the barrier – i.e. what we see in most data model approaches:

The Blood-Brain Barrier - securely open, flexibly opinionated - The Blood-Brain Barrier

So that’s the most basic version. Some technologies force this distinction and only permit data model and data, with a (usually implied) type-instance relationship between them. ISO10303 does this – a data schema defined in ISO10303-11 can be instantiated as ISO10303-21 data. It’s simple and it’s clean and therefore easy to build reliable systems with. However, it’s not very extensible. If there’s one universal truth in this business it is that you will have got your data model wrong. Often spectacularly, especially if the model has been derived from the most expansive and elaborate works of fiction known to mankind, the User Requirement. The problem with data models is that they tend to get implemented in ways that are hard to change in use – hard coded APIs, relational database tables, JSON schemas, etc. all of which have thousands of lines of code relying on them. There are only two stages in major implementation programmes – too early to tell, and too late to change. It’s where most of the data quality problems come from – systems that can’t properly capture the data the users want to create, so the users just ram it into any fields that will allow it. Over time, different communities using the system develop their own cargo cults about how to populate the system. Formal ontologies and standards like RDF Schema offer some better approaches, but before we dig into that, let’s look at how the smarter data modellers used to deal with this problem – reference data !

The Blood-Brain Barrier - securely open, flexibly opinionated - The Blood-Brain Barrier

This keeps the strict (only type-instance) approach from before but allows new classes to be “minted” in the reference data. It only solves the inflexibility problem if you create enough (and appropriate) ClassOfX classes in your ontology / data model. The idea is that the reference data can change more often than the data model because it doesn’t require system rewrites. Don’t underestimate the willingness of developers to shoot themselves (and everyone else) in the foot by hard-coding references to the reference data into a system though…this happens way more often that you’d ever imagine.

The IES4 ontology takes a similar approach, but also allows for new classes to be minted in the dataset being exchanged. The IES ontology uses RDF Schema though, and RDF Schema doesn’t care what relationships go between ontologies and datasets. This has led to some debate about the necessity for creating large numbers of powertypes (classes of classes) in IES, swelling the ontology in a way that many see as unnecessary. For IES5 it is proposed that any relationships can cross the barrier:

The Blood-Brain Barrier - securely open, flexibly opinionated - The Blood-Brain Barrier

New classes can be created as needed, provided they reference back to the ontology. Similarly, the ontology can describe universal particulars (individuals) such as the World of interest. It’s simpler and cleaner, and still allows for reference data libraries. In fact, for IES5 there is likely to be a layered approach:

The Blood-Brain Barrier - securely open, flexibly opinionated - The Blood-Brain Barrier

IES is a 4D ontology, along with others such as ISO15926, HQDM, and IDEAS. They’ve all taken slightly different approaches to crossing the barrier. With the move to IES5, the approach is more in line with that taken in the IDEAS ontology, and 3D ontologies such as BFO.