There’s lots of different ways to describe data.  I’ve seen data models that attempt to describe, conceptually, all of the data relationships for lines of business, marketing programs, fulfillment programs, etc.  Conceptual data models are useful, primarily because they give you a starting point to work with the business to first understand, and then communicate, how the data can represent the business’ requirements.

Normally, when creating a system, we drop down to a logical data model for that system.  We indicate the “data on the inside” and the “data on the outside”.  Effectively, the diagram starts with a large ‘box’.  Inside the box are entities needed by the application.  Outside are the entities that come from somewhere else but are referenced by the application.

One challenge, however, that appears to be stumping one of my team mates is how to create the conceptual model when there is not one system, but two or three systems that communicate.  Effectively, we are talking about a distributed system, with distributed data.  The data is not distributed because of geography, but rather in order to foster loose coupling.

This is a different way to look at the design of a system than is typically seen, but I feel pretty strongly that it is an important aspect, and one that we need to be fairly formal about.

I approach the systems from the standpoint of the business processes first, and the use cases second.  For example, if you are creating a system that facilitates the creation of a standard business contract, it is entirely reasonable to break down the process into steps, where each step is performed by different roles. 

First step would be to define a marketing or fulfillment program that the contract will be tied to.  Second would be to create legal clauses that can be fit into the document.  Third would be to create a template with rules for how the clauses are to be assembled for the particular contract type, and fourth would be to create the contract itself.  Different people perform each step.  Each step has distinct responsibilities.  You could, if you wish, create a seperate system for each.  In a SOA world, I think that you would create a set of services for each.

Each set of services is, in itself, an independent system.  In order to remain decoupled, the data may be referential, but not coupled.  Therefore, you may need to add a customer before you add an invoice, but there is NO reason that adding a customer should create data records directly in the order management database (I’m being a purist… Master Data Management is the ‘reality’ behind this situation).

So, if you are a developer who is used to creating a database with every bit of data that you think you will need in it, it can be quite a change to create not one, but many databases, bound together by master data that is copied locally on demand, and kept up to date by a cache engine (MDM).

Now, take one of those developers and ask him or her to create a data model that illustrates not “data on the inside” but “data in each room”.  That requires a different kind of thinking… because now, the problem of ‘master data’ becomes visible (and a little painful).

In this model, the Product data is brought across both for invoices and for shipments, but is it really the Product data that is in the shipments, or is it product and lot data.  In other words, it is one thing to ask “who did we ship soap to,” and another thing altogether to ask “who did we ship Lot 41 of tainted beef to?”

This distinction, between product and lot, becomes particular visible when you model your systems this way, but more importantly, you can see the lines that cross the boundary between systems, and you can place services on each line: get product, get lot, get invoice, get shipment

When designing the database, you will need to use a replication or cache or transactional store to insure referential integrity.

By Nick Malik

Former CIO and present Strategic Architect, Nick Malik is a Seattle based business and technology advisor with over 30 years of professional experience in management, systems, and technology. He is the co-author of the influential paper "Perspectives on Enterprise Architecture" with Dr. Brian Cameron that effectively defined modern Enterprise Architecture practices, and he is frequent speaker at public gatherings on Enterprise Architecture and related topics. He coauthored a book on Visual Storytelling with Martin Sykes and Mark West titled "Stories That Move Mountains".

6 thoughts on “A distributed systems' logical data model”
  1. Great article Nick.  This is something I have been trying to articulate for some time now.  The idea of using the data from another system when you need it rather than designing for every piece of information and having serveral systems with the same or redundant data.  I know that isn’t exactly what you were sayiong, but that can often be the result when trying to integrate with systems that have been in place  or are in place prior to the new project.  It is also the case many times as small customers become larger.  They have all of the personel data in some system that has it’s own DBMS.  So why do I have to design my db for that as apposed to just using theirs as sort of the master data and taking what I need from that when I need it.  

  2. Hi Mike,

    You are correct… this is what I am saying, although the conundrum that triggered this post is about new development.  The problem is, as you correctly pointed out, the same: design to bring data in as you need it, in a decoupled manner, not with large data copies.

    — Nick

  3. I hope you don’t mind the criticism, but the manner in which you are modeling is typically something to avoid unless you have to do it, i.e. unless you are being forced to tightly couple (tie) together a set of systems, typically in a synchronous manner.

    The purpose of SOA is to hide many of the details that you are referring to here. In this example, you are discussing what I refer to as "Data As A Service", but it’s just one specialized case of SOA.

    Peace.

  4. Hi Cameron,

    As for your opinion, I thank you for it, but I respectfully submit that you missed the point.

    By modeling a logical data model, I am not making any statement whatsoever about how the data will move across the boundaries… I am stating what "system" will own the data.  

    We’d all like to think that SOA will create an environment of "brilliant pebbles" that are 100% decoupled, but that is absurd.  70% (or more) of all services are exposed by enterprise systems, not stand-alone service containers.  

    This model helps the developers to understand how the data entities are related to one another in the enterprise systems and in which system they are mastered.  That is essential to the notion of "data on the inside vs. data on the outside" that is explained by Pat Helland.

    Only after you understand this landscape is it possible to develop a large distributed business system that needs to interact across boundaries with a dozen other systems.  

    That’s not to say that there aren’t SOA services that are justifiably ‘stand alone’ because there are.  Workflow is a great example, and I can reel off a dozen more.  But when it comes the non-infrastructure business data, these models are mission critical.

  5. OK, it sounds a bit like master data systems, particularly if you are aggregating across a number of systems to provide master data.

    I do get to do work with fairly large scale distributed systems (stock exchanges, trading systems, telco systems, etc.), but I don’t have much recent experience "implementing"; rather, I have been involved in high level architecture, probably too high level to fully appreciate some of the difficulties involved.

    Architecturally, what worried me in the picture was the presence of lines between entities owned by different systems. One cannot easily maintain referential integrity across disparate systems; perhaps those lines indicate relationships that are implicit or otherwise determinable, but not necessarily so tightly bound as I imagined?

    Peace.

  6. Yes, I’m talking about master data management.

    Your concern is completely justified.  When a line crosses a boundary, then you can no longer count on DB systems to maintain referential integrity for you. MDM helps because it provides a way to get access to the master data that is published by another system, which is critical when the destination of the transaction is either that other system or another system that will respect the same domain tables.

    The potential for ‘tight binding’ is present.  The Architect is concerned with mitigating this potential, and often does so through concepts like event-driven master data management, distributed data caching, and service oriented architecture.  

    I am also involved in high level architecture but my team has a strong desire to ‘keep it real.’ We do so by keeping each of us involved at the tangible level of an enterprise project.  I’m a massive geek and would probably float off into the realm of the fictional "correct" system if it weren’t for this connection.

Leave a Reply

Your email address will not be published. Required fields are marked *

9 + 8 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.