This is the second part of a series — the first part is here.
So one of the big problems in data modelling and ontology is knowing what you’re talking about. I’m not talking about the self-appointed “thought leaders”, I’m talking about identifying the things in your ontology. How can we be sure that two people, or two systems, are referring to the same thing ? If you say “pump P101″, is it the same P101 that I know ? Maybe there’s some context available to let us know that both of these are in the same chemical plant, and it would make sense that they’d have a unique identification scheme within the plant ? But even then, is it really the same pump ? P101 is what process engineers call a “tag number” — it refers to a functional location in the plant where you’d expect to find a pump. Over the years though, the pump at that location will be swapped out a number of times. In any given year, you could reasonably expect to see a different serial number on the pump. Is it a different pump ? Welcome to the Ship of Theseus problem — otherwise known as “Trigger’s Broom” in the UK. Identity is a complex problem, and it’s not one that can be solved by simply attaching an ID string to your data.
There’s also the problem of two things having the same ID — believe it or not some chemical plants will have two P101s, usually in multiple process trains that all follow the same design. P101 can refer to the same functional position in four different processing trains. We’ll cover identifiers and names in another blog later. For now, let’s stick to figuring out what we’re talking about.
Another identity problem exists where we want to talk about historical facts. My weight now (unfortunately) is not the same as it was in my 20s. I may have also changed name, employer, gender, etc. during that time. If you’re lucky, you might have an HR system in your organisation that has a field for previous surnames (in the event of marriage) but it’s unlikely to have previous gender, or be able to recognise more than one previous name. Similarly, it’s unusual but not impossible to find a “previous address” field but there will likely only be one of them.
A few years back, I was part of a team of 4D experts that had been called in to look at a problem in multinational company. That company had bought an ERP system. That ought to be bad enough, but worse still, they had allowed various parts of the organisation to configure it however best suited them. I believe this was advice from a consulting firm. Predictably, there were identity problems — the various teams had valid business reasons to hold data about the same things, but had used completely different identity schemes. The result was that the c-suite were now completely blind — they couldn’t bring together all the data from the various parts of the business because there was no common form of identification. They were spending a lot of money (funnily enough, with the consultants) to manually triage the data for the management dashboards. As we dug into it further it emerged the systems also had supplier and customer tables with the same organisations in both, and this was causing problems within a single team, but the problem was many times worse when factored across multiple teams.
So how does 4D help ? 4D ontologies are extensional — objects’ extensions in spacetime define their identity. Objects are unique based on their spatio-temporal extent — no two objects can occupy the same chunk of spacetime. This approach has the advantage of doing away with arguments about names and identifiers — we have a rock solid way to identify things. Looking back at our friend P101, let’s assume there are two P101s in two different process trains on the same plant. We know these are different because they occupy completely different extents in spacetime. Each one of them has a spatial extent of their position in the plant, and a temporal extent that begins when the train was commissioned and ends when the train is decommissioned. During that time, there will have been a number of individual, serial-numbered pumps in that P101 location — same spatial extent, different temporal extent, so not the same identity.
But what if, in the tradition of Theseus, one of those serial numbered (124ABC) pumps was removed from the first train P101 position, repaired, then later installed in the second train P101 position ? Well, again, 4D has got you covered. One of the key concepts in a 4D ontology is that of states. States are temporal parts of an object — all of the spatial extent, but for a particular period of time. I this case we have identified probably five states of our 124ABC pump — a state when it was installed in the first train, a state when it was broken, a state when it was being repaired, [likely] a state when it was in storage, and another state when it was installed in the second train P101 position
States also give us a way to manage problems like multiple previous addresses, previous names, previous weights, etc. They also provide a really consistent way to manage interactions (activities) involving multiple parties, but more about that in another blog. The spacetime diagram format used above will feature often in these blogs. They are similar to Minkowsi diagrams used in physics, but with time on the x axis and space on the y. The spatial dimension is topological, and is not attempting to reflect any distances between objects, however if an object is stationary relative to another, their world-lines should be parallel. In the above example, the facilities (grey world lines, strictly “world volumes”) do not move relative to each other or our frame of reference (the earth), so are parallel and flat. Spacetime diagrams feature extensively in the books Developing High Quality Data Models by Matthew West and Business Objects:re-engineering for re-use by Chris Partridge.
Hopefully this wasn’t too hard to understand as a primer. I’ll cover more in later blogs, but if you feel any of these points need more explanation, let me know. There’s also the question of identity when it comes to types of things, and we’ll cover that in the next blog.
The P101 example goes all the way back to the 90s and PISTEP programme. The process industry was looking for solutions to manage thorny, safety critical problems such as this in a consistent way. The late Dr Matthew West came up with this example and proposed the first data modelling approach using something very similar to what we now call states. It was a chance meeting with Chris Partridge at a conference where he realised what he was doing was a 4D approach.
The problem of identity troubled Leibniz and is documented in his correspondence with Clarke about the issues he perceived in Newton’s theories. It wasn’t until Einstein and Minkowski (or McTaggart if you prefer philosophers to physicists) started to think about spacetime that some of these issues of identity could be resolved.