4D Ontology – Pt1 – The Why

4D Ontology – Pt1 – The Why - securely open, flexibly opinionated - 4D Ontology – Pt1 – The Why

This is (hopefully) the first in a series of blogs introducing 4D ontology. I’ll try to write them as regularly as possible, but I spend most of my time designing and developing a secure data platform for 4D data, so the blog output might be a bit sporadic. I wanted to write about this because I think it’s important. I’m also aware that there are a lot of people keen to learn more about these 4D models, and its hard to know where to get started. I’ve been at this for nearly 30 years, but I’ll still defer to Chris Partridge on some of the really difficult stuff. So I’ll be the first to admit it’s not easy, but that’s what makes it worthwhile — if you want to make a difference, choose to do the hard things. It can be ok to cut corners if you know why you’ve done it and that it’s going to cause technical debt you’ll need to go back and fix. It’s not OK to take a simplistic approach to difficult problems.

First of all, a quick definition of ontology. You’ll find quite a few definitions on the web, and they don’t all agree with each other, but a fairly short and broad definition is that an ontology is a model of something you’re interested in. How that model is built will depend on what you want to do with it. This means that ontologies that cover the same subjects can still vary significantly. Some are little more than taxonomies or vocabularies — they focus on terminology — and are generally not considered to be formal ontologies. Others are designed for machine reasoning purposes, and can be very formal in their structure. Reasoning ontologies can be somewhat counter-intuitive in that they often sacrifice the requirement to accurately model the world in order to be tractable. 4D ontologies don’t usually major on reasoning — instead they are designed to closely model the real world — we can cover the reasons for not reasoning in later blogs, but we need to cover more ground first. These are not hard and fast distinctions — there are 4D ontologies out there that have been designed for reasoning purposes, and there are also taxonomies (reference data libraries) that are used to extend 4D ontologies. I find that the 4D approach is intuitive (once you’ve un-learned old ideas) and solves a lot of problems in data management, especially where data is being integrated from multiple sources.

So why on earth would we need a formal ontology (4D or otherwise) if it’s so challenging ? There is a danger with data modelling and ontology work that everyone thinks it’s easy — it’s just boxes with familiar looking words in them and lines between them, surely ? Well, no it isn’t. The goal of a good ontology is provide a high fidelity model of the world you’re interested in. The idea goes that if your model closely tracks the things in the real world, the software you build around that model will do a better job at solving your business problems. I’ve lost count of the number of systems that have failed simply because the data model at their heart was incapable of representing the information the users needed to work with. These failures are sometimes found at user acceptance testing, but the really pernicious ones are those which go into production and start to really skew the way the business works. This can be because the users abuse the system — ramming data into any field they can because the one they need isn’t in the model. It’s really easy to blame users for data quality problems, but it’s always the model that’s the original cause. Aside from the huge costs incurred from data quality, it can get even weirder — rules enforced by databases that somehow become folklore in enterprises. Generations of users come to think of these rules as “business constraints” when in fact they’re just the real-world consequence of a really poorly designed data model. This can go on for decades, and any attempt to relax the rule will be called “breaking the business”. The users know no other way— it’s Stockholm syndrome.

Having systems that are more semantically rich saves money and lives. If you think I’m exaggerating, let me tell you about a meeting I dropped in on a few years back. The purpose of the meeting was to improve the semantics of a draft military data standard a number of nations were working on. The editor of the standard was there, as was a representative from one of the nations who had a lot of concerns about the quality of the model. Let’s call them “John” and “Gisela”. The conversation went something like:

Gisela: I have many concerns about this model
John: Oh really ? Such as ?
Gisela: For example “indirect_fires” [rockets,mortars,artillery]
John: It has been widely reviewed.
Gisela: It has a field named “location” – a latitude and longitude
John: Yes, yes, we have standardised on decimal lat longs to WGS84
Gisela: Yes, but is this the location of the weapon or the target ?
John: Um…
Gisela: To me, this is important

I understand that model has now been completely re-worked. What they realised was that it’s just not good enough to put words in boxes and draw lines between them. Getting this stuff right means really thinking about what you’re doing, and having the tools of formal logic and philosophy at your disposal. Because data modelling is hard, most organisations don’t have the expertise to know when the consultant they’ve hired knows less than they do — they don’t have a Gisela. Hopefully this series of blogs will help more folks find their inner Gisela.

Up to this point, I think I’ve made a reasonable case for formal ontology work — taking your data modelling seriously and applying the tools that mathematicians and philosophers have been using for years, but are often ignored by software professionals. I haven’t made the case for 4D though, and that will take a few more blogs to achieve. However (spoilers) I believe the 4D approach gives you a way to be much less ambiguous about how you represent facts. Reducing ambiguity increases data quality, and better enables integration of data that conforms to different models and standards. It’s hard to learn, but the rewards are great if you’re a data professional.

Part 2 (on spacetime !) is here.