If we want to decouple a SOA system, we must get away from the notion of the remote procedure call.  In other words, our services need to have as few “command” messages as we can get away with.  This is a design philosophy but it is easier said than done.

According to Hohpe and Wolfe, there are three basic message patterns.  Excerpt from their classic work Enterprise Integration Patterns:

Message intent β€” Messages are ultimately just bundles of data, but the sender can have different intentions for what it expects the receiver to do with the message. It can send a Command Message, specifying a function or method on the receiver that the sender wishes to invoke. The sender is telling the receiver what code to run. It can send a Document Message, enabling the sender to transmit one of its data structures to the receiver. The sender is passing the data to the receiver, but not specifying what the receiver should necessarily do with it. Or it can send an Event Message, notifying the receiver of a change in the sender. The sender is not telling the receiver how to react, just providing notification.

If you look carefully, it isn’t hard to see that the command message is sent with the understanding that something will happen on the receiving end.  More importantly, the sending KNOWS what will happen on the receiver’s end.  This is a particularly insidious form of coupling.  It is also really simple.

command

All kinds of things creep into a SOA message as a result of this knowledge.  If I am asking a service to “Create an Invoice” and I send data that describes an invoice to another system, I am making a lot of assumptions. 

  1. I am assuming that the receiver will succeed.  The receiver has to be present. I have not said “I have an invoice for you.”  I have said “I need you to do this for me.”  If the receiver isn’t present, then what?  Sure, I can use durable messaging, but if a message is stuck in a queue, it isn’t in ANY part of the distributed system.  It vanishes from existence until it turns up at the other end.  The invoice isn’t created… it doesn’t exist in any form… for an indefinite period of time.  NOT GOOD.
     
  2. I am assuming that the sender should, ultimately, have the right to decide if an invoice should be created.  That’s odd.  Why do I even need the receiver to create the invoice?  Answer: because the sender cannot.  Implication: the sender does not have the “right” to create an invoice.  The receiver is the “system of record,” not the sender.
     
    But if someone wants to add a new validation rule, that says we won’t create invoices for customers in Germany because of a new import law that went into effect, where does that restriction go?  If we put it into the system of record, (the receiver), then we have to put a rule into the sender as well, to allow it to return an error message or handle the refusal.  In effect, the sender still has intimate knowledge of the workings of the receiver. 
     
  3. I am assuming that the invoice isn’t already there and the sender is the first system to notice!  This flies in the face of reality.  It is entirely normal for the CRM system, the billing system, the shipping system, and perhaps even the portal sytem to be part of a “new invoice” process.  If I make the statement that “It is I, Sender of the Magic Message, Master of Invoice Creation, who has the right to demand an invoice into existence!” then what… people have to call the sender to create an invoice?  What if they don’t!  What if they call the system of record and I discover the sale later.  Will I mistakenly generate another invoice?  Or will I have to put complex rules into my code to insure that I only generate some of the invoices that I am aware of, but not others because I know that other systems have ordered the creation of the invoice.  Unmaintainable.  
     
  4. I assume that creating the invoice should happen RIGHT NOW for both my system and the receiver.  That may be convenient for me, but not for the receiver.  In fact, it may be wildly unreliable for the receiver to receive messages as they come in the door.  Or perhaps some messages happen right away but others take too long.  This places an artifical constraint on the system of record: do what you want, as long as it doesn’t take more than 100 milliseconds.  This is seriously tight coupling. 

Each of these assumptions exist in a Remote Procedure Call.  They are forms of coupling, pure and simple.  They fly in the face of SOA.

So what to do?  How do you avoid making SOA endpoints that are commands?  If I want to offer the ability to create an invoice to the enterprise, what should my endpoint look like?

You have two choices: event driven and document driven

Event driven looks like this:

event driven

First of all, there is an event that you need to subscribe to.  It is not the event of “invoice created” because the sender is not allowed to create invoices.  Therefore, we need the system of record to subscribe to a different event… but what event?

What event occurs in a business that says “create an invoice.”  How about “we made a sale?”  Think of the subtle difference.  An invoice is a document.  We use it to track the sale.  We assign a number to it and we look up other sales etc.  But it isn’t the BUSINESS event.  The business didn’t make an invoice.  The business made a sale.  Operations people made an invoice to track the sale (long before computers came along).  One tidbit: the fact that the system of record can reject the transaction means that this is an “unapproved sale.”

Notice Steps 2 and 3.  The event message usually doesn’t contain sufficient information for  a system of record to fulfill it’s responsibilities.  It subscribed to the event, and therefore discovered it, but it needs to call back to the source system to get the actual data to act upon. 

Notice Step 4 above.  There are two subscriptions back to the sending system.  This handles the case where the sale wasn’t allowed.  It is the system of record that denies the sale.  

There is an interesting bit of coupling still going on here.  The source of our sale came from a system that is not aware of the rules surrounding a sale.  It has a simple task: collect data and start a sale transaction.  However, we had to subscribe to two kinds of events, didn’t we?  We had to subscribe to both “Invoice” and “Sale Denied”.  That means that we had to tell that source system that there were two possible status values for a sale, and we had to subscribe to each. What if a third comes along?  What if the business wants to change the rules to allow a new kind of sale… one that doesn’t generate an invoice OR a sale-denied message?  We’d have to change BOTH systems.

Both systems are coupled on the business process itself… the business process isn’t in the diagram but it definitely affects the design.  So how do you decouple from that?  Let’s look at Document based messaging.

Document Driven

First thing to notice: the number of ports and channels is a LOT simpler and we don’t spend nearly the same amount of time “chatting” about things.  However, in this model, the responsibilities of both systems are VERY different.  This is an architectural design change.  This kind of change CAN be added to a system later, but it is more expensive than if you add it up front.

In this model, we don’t send events at all.  Notice that.  We send documents and the documents have a transaction id that is carried from point to point.  As the document goes from describing an unapproved sale to describing an invoice (and later to a shipment), you carry one transaction id along the way.  This is your correlation identifier.  This allows each of the systems to perform activities based on their own business processes, without needing to know anything about the business process implied in the other system.

Notice that we are down to one response subscription, and it isn’t even a specific subscription.  It basically says “For any transaction that started with me, or that I touched, please send me back any documents related to it so I can update my status.”  It is very simple.

The simplicity is a bit deceiving.  If we need to trigger another event on the transaction source, we need to put in some logic for that, but it is not a substantial change over the event-driven approach.  The logic is simply contained in the app instead of the messaging system.

Conclusion

So which method is better?  Should we use commands, events, or documents? 

I’m a big fan of simplicity.  And the simplest method, in the end, is document driven.  Unfortunately, due to the architectural changes needed to make it happen, we often cannot start there. 

So, for a migration plan, take legacy systems and make them event driven.  If you are building new systems: make them document driven.  Either way… kill the command message!  Avoid that mess.

By Nick Malik

Former CIO and present Strategic Architect, Nick Malik is a Seattle based business and technology advisor with over 30 years of professional experience in management, systems, and technology. He is the co-author of the influential paper "Perspectives on Enterprise Architecture" with Dr. Brian Cameron that effectively defined modern Enterprise Architecture practices, and he is frequent speaker at public gatherings on Enterprise Architecture and related topics. He coauthored a book on Visual Storytelling with Martin Sykes and Mark West titled "Stories That Move Mountains".

15 thoughts on “Killing the Command message: should we use Events or Documents?”
  1. I think your figures are a bit biased towards document messages; the event figure is almost as complicated as it gets,  while the document figure is too simplified. Even your document based solution might need to ask for more data from the publisher or other services, and the invoice service can still fail.

    The publisher of an event do not need to subscribe to success+failure events, it can just do what is common in workflow solutions: just start a new process (workflow) and assume that it will eventually succeed, and that there is some monitoring (BAM) and a "dead-letter queue"  that is handled to pick up on errors. The publisher only needs "handle external event" if the two autonomous, asynchronous processes needs to be joined at checkpoints. Otherwise, the sale process just asks for invoice data when naturally a part of its process. Afterall, the publisher might even not need to know that an invoice gets created – as the event message never contains any defined handling operations.

    Using workflow for long-running processes is a proven concept, and I like to apply this design-style to SOA systems.

    I think a combination of event+document is better, it removes the need for step 2 and 3 in your figure. And if you fire-and-forget the event and relies on monitoring, step 4 and 5 are also gone, at the cost of a BAM mechanism.

  2. @ Kjell-Sverre,

    Perhaps my document figure is missing a possible call-back, since I was aiming for a ‘normal’ configuration, but I did capture 100% of the requirements of a normal configuration.  

    I did handle the errors in the generic subscription because the error would have the same transaction id as the sale, so it would be picked up by the same rule.

    That said, I don’t think it makes sense to eradicate events.  As I stated, events have their place, and so do command-response messages (within an application domain boundary).  I do believe, however, that document based routing is quite powerful and still simpler than typical routing scenarios based entirely on events.  

    Basically, it is "big events" with "rule based filters" but that doesn’t sound as catchy as "document messaging."  πŸ™‚

    Thanks for the feedback.

  3. I am not convinced on the document model when dealing with other situations.

    Unless I am misunderstanding it, it seems like it could easily introduce odd dependencies or thoughts of doing things like "merging" of documents when dealing with parallelism. The event model with granular messages seems to be a much better fit for parallelism as its easily defined in the pi-calculus.

    Could you perhaps explain parallelism in a document passing based system?

  4. Splendid posting, Nick. There is however one thing that I don’t grasp. In my philosophy events are at the basis of documents where documents – in an ideal situation – contain a description of the full event context, including – functional – correlation identifiers like your tx identifier. So I come to "SaleMade", "SaleApproved" and "SaleDenied" documents. Is this in contrast with your philosophy?

    I agree with you saying in the reply above: Basically, it is "big events" with "rule based filters" but that doesn’t sound as catchy as "document messaging."  

    Thats looks a lot like my posting on loosely coupled process flows: http://soa-eda.blogspot.com/2007/06/how-to-implement-loosely-coupled.html

    Jack

  5. Hi Jack,

    In order to do content based routing, some things have to be fairly stable.  In EDI, everything has an envelope and a header.  We inspect the envelope for addressing and the header for context of the message.  By creating three kinds of messages for SaleMade, SaleApproved, and SaleDenied, you have put the state of the sale into the header, and that is fine to a point.

    I did not do that because I want the receivers to create a single subscription on the document type (Sale) and have them place a rule with the routing system asking for the documents in the desired status.  Something akin to "Subscribe( Sale, Status: SaleMade | SaleDenied)"

    The key thing is that the "Routing effect" of both schemes is the same.  The subscriber receives ONLY the documents they want.  

    Having two different documents has the advantage that you can share completely different schema as the status changes.  I did not capture that effect in the described mechanism above, but it is a valuable one.  On the other hand, having one document with status allows a system to receive messages that it didn’t exist when it was designed simply by changing a routing rule.  The schema of the inbound message is the same, and the receiving system can ignore the status and use other fields… like the remote system’s invoice document number.  

    Architecturally, the exchange pattern is the same between your approach and the one I described.  It’s an interesting tradeoff.  I’ll have to think about what criteria I would describe for deciding between the two mechanisms.  Both are strong.

    Your feedback is always of very high value, Jack.

    — Nick

  6. Hi Nick,

    Great article, keep it coming.

    Alright firstly the business event that says "create an invoice" is "bill the customer" and not "we made a sale". Invoicing can be on order or pre/with/post delivery and either automated or semi-automated or manual depending on the defined business processes and the "we made a sale" is a different but related business event.

    Secondly, in the event driven example, I suggest that we should make the "invoiced" and "sale denied" an enumerated status of the sale – and the mechanics in play would be that both parties shown would subscribe to the change of status events, rather than each individual event. Therefore, any change of the status would be or  should be treated as a business event. Some EDA purist, would perhaps disagree with such assimilation but to me it brings flexibility and simplicity – just as you highlighted for the document based example.  In fact, in one of the system I am designing the end-user can introduce their own enumerated statu(e)s, which can be changed by any part of the related business process/workflow and in turn trigger other processes/changes.

    Also, I tend to think that you really have over simplified the document based approach. For example, doesn’t document based approaches bring concurrency issues? And also document based approach pushes you towards a linear execution plan, which is in contrast to event based archetypes that excel  in state-machine type interactions (and therefore more dynamic and flexible business flows).  

    Lastly, I see that you have put a bus type middlewear for both document and event based approaches – how much does that weigh in the bigger picture of the architecture? And the underlying implementation you recommend is pub/sub or asynchronous consumption to kill the command message?

  7. Just on the issue of having multiple kinds of events, using a hierarchical-topic based infrastructure, you could have systems subscribe to the top-level "sale" topic, or to the specific sub-topics: "sale made", "sale approved", "sale denied".

    Would that change your thoughts about the event-based solution?

Leave a Reply

Your email address will not be published. Required fields are marked *

5 × 4 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.