One challenge in designing Event Driven SOA exchanges is to decide how big your event messages should be. I have an approach that I believe works, and can leverage information that probably already exists in your organization… in the data warehouse.
First a quick introduction to Event Driven SOA, for those folks who aren’t familiar with the concept. (Note: some folks use the term SOA+EDA to refer to this idea).
Event driven SOA is a subtype of Service Oriented Architecture that I believe is one of the few ways to build an good distributed system using SOA principles. You start with the core principles of SOA and you add an additional constraint: messages will be passed using async publish-subscribe mechanisms to the greatest extent possible.
A great overview of the distinction between SOA and Event Driven SOA is provided by Jason Bloomberg in this Zapthink article. An excellent source of information on EDA as an extension to simple SOA can be found by reading the blog posts of Jack van Hoof, a talented SOA architect and proficient blogger.
One reality with Event Driven SOA is that you need to publish events to an infrastructure, like an ESB. Those events can be sent to any number of subscribers, who will act on them.
One principle that I espouse is that your event messages need to have “sufficient information in order for the subscriber to decide if it should consume or discard the event without requesting further information from another system.”
Another principle, and one that may appear to be in conflict with the above principle, is that event messages should be a small as possible, but no smaller.
This is important, because the subscribers may not be interested in all events of a particular type, but rather all events of a particular subtype. If the event message is too small, there may not be sufficient information for a subscriber to decide if they should act on the event, or throw it away.
It is less expensive to put a reasonable amount of information in the event message, so that the subscriber doesn’t have to call back to the event publisher before deciding if it should discard the event or use it. But how much information is “a reasonable amount?” I can answer that.
Let’s say that you have a customer relationship management (CRM) system. You want an entry to be added to the CRM system when a customer buys a product, regardless of the how the customer got the product (from a partner, or a retailer, or an online promotion, etc). This way, when the customer calls with a problem, you know what products they own, when they purchased them, etc.
But let’s say that your company has locations in Europe and the USA, so you have implemented two CRM systems. The CRM system in Europe handles information about customers in the Eastern hemisphere, while the CRM system in USA handles information about customers in the western hemisphere.
A new sale is made. An event message goes out: “new sale is made.” The CRM system in Europe gets the message. Does it act on the event? That depends… is the customer in Europe?
How much information is enough… or too much?
SOA only works if the event subscriber is independent of the publisher. The two must be loosely coupled.
However, with the scenario above, it would be really tempting to say that we should put some kind of ‘hemisphere code’ into the ‘new sale’ event. That way, the subscribers (CRM systems) could decide if they should discard the event or act on it. Adding this field, on the other hand, is coupling. We have allowed the information requirements of a subscriber to influence the messages sent by the publisher.
So, how many times can we do this? What if we add 10 new subscribers, and each needs some variation (one wants a customer id, a second wants to know the marketing program tied to the agreement that is tied to the sale, another wants to know what family of products were included in the sale)? If you meet all of their needs, the event message will be huge.
There has to be a fine line that allows you to send a message that is “big enough for most folks” but still requires some systems to occasionally ask for further data before proceeding.
The answer lies in the data warehouse schema
Ever seen a star schema? It is a way of structuring information to allow a tremendous amount of flexibility in reporting, and is used extensively in business intelligence and data warehousing. In the center of a star schema is a ‘fact table’ that contains a row for every ‘fact.’ This row usually has a numeric value in it along with identifiers from a set of ‘dimension tables’ that surround it. Each dimension may have another dimension behind it, creating a ‘star’ pattern.
For our scenario, I created the following star schema. Note: this is off the top of my head, and is not indicative of any actual system, in Microsoft or anywhere else.
The fact table is labeled ‘Sale’ and sits at the center of the star. The numeric ‘fact’ is probably the dollar amount allocated to the sale of a particular item on an invoice. (commonly, in the warehouse, each row of an invoice is represented independently, to allow information to be easily aggregated by product).
Of course, deciding what dimensions to use in your fact table requires a great deal of analysis. The effort to create the dimensions requires some understanding of “what things are important to the business.” These star schemas change fairly rarely, because once they are in place, and a million facts are loaded up, it can take a huge amount of effort to add a new dimension.
And this stability is useful. We can use it to create the decoupling that we need.
The right sized event message
If you look at the Sale schema in the data warehouse, as illustrated above, there are a series of tables that form the first tier in the dimensions. I have colored them orange, and they include things like product, date, fulfillment type, agreement, and customer. Clearly, these are things that are important to the business.
I maintain that the ‘right sized event message’ will contain information from most, if not all, of the dimensions that the event would require in a data warehouse. Therefore, for this particular event: “new sale,” a good XML structure may be:
Some notes here:
- I formatted the message above using a simplified, readable XML layout. I am not implying that event messages must to be in any particular format. That is up to you. I’m just trying to be readable on my Blog.
- Careful readers will notice that one of the dimension tables is ‘product’ but I don’t have a list of products in the event message. That is because there can be quite a few products in a ‘sale.’ I try to avoid putting an open-ended list into an event message.
Clearly, there are tradeoffs here. (Aren’t there always?)
- Clearly, the star schema contains branches. I didn’t reflect every branch beyond the first tier dimensions. This is an effort to keep the message small.
- The Sale has a sale id of ‘sale/234434’ which is not a full URI. I’m not making a statement for, or against, using a URI for an enterprise identifier. That is up to you. Similarly, I used a GUID to identify the event message itself. Whatever id you use to identify the message itself, that is up to you.
- You have undoubtedly noticed that the event message contains identifiers for the elements to be linked to, not full details. Therefore, the subscribing system will need to be able to access the values for customers and/or agreements from the messaging infrastructure if that information is needed. There is no attempt to be comprehensive. Simply sufficient.
- You will notice that I put the facts about the sale value into the event message. This is entirely optional. However, it is reasonable to predict that a message consumer may want to differentiate the message based on the facts that sit at the center of the star schema.
Remember our scenario? We said that we implemented a CRM system in the USA and another one in Europe. The event message above comes in to the CRM system, but we don’t have an indicator if the customer is in USA or Europe. So did we meet the criteria? Is this event message large enough, without being too large? Is it decoupled enough.
I believe that the answer to both questions is “Yes.” This is because we included the customer id. Clearly, a CRM system would know what customers it has. The CRM system would do a quick lookup against its own database to find the customer before deciding to process the message.
This allows the subscriber (CRM) to decide quickly if it needs to process the message or not, without going back to the publisher to ask useful information like “what is the id of the customer.” It also allows the publisher to be independent of the subscriber, while still providing the fields that the business considers important for understanding the transaction itself.
One practice to consider: when deciding what fields should be in your event messages, look at the information captured in the tables of your data warehouse. This applies especially to relevant fact tables and first level dimension tables.
If you can find a star schema in your BI database that matches the entity you are trying to model, consider yourself lucky. In that case, use the dimensions of the fact table to decide what values the company cares about and model your events to contain most if not all of your fields.