Let’s say that you have two systems: Adipose and BellyFat. They both need the same information. Adipose handles customer transactions, so it needs information about customers. BellyFat handles the long-term management of customer information, like what products they have purchased and what rights they own. It also needs information about customers.
How do we keep the customer information, in these two systems, in sync? SOA offers three answers: Call-and-Response, Optimistic Data Sync and Pessimistic Data Sync.
- Call-and-Response says: One system holds the data. The other does not. The system that does not hold the data calls the one that does, on a real time basis, and gets the data back quickly and efficiently.
- Optimistic Data Sync says: Both systems keep a copy of the data. If an event happens in either system, drop it on the bus. The other system will get the event, interpret it correctly, and update its internal store to reflect the change.
- Pessimistic Data Sync says: One system masters the data, but the other system keeps a local copy. If an event happens in either system, drop it on the bus. The other system will get the event, interpret it as best it can, and update its internal store according to its own business rules. On a periodic basis, the ENTIRE data structure will be copied from the master to overwrite the data in the local copies (data refresh).
Each of these three has its own strengths and weaknesses. One of them, Optimistic data sync, is so bad, however, that I’d like to call it out for special ridicule.
|Call and Response||
|Optimistic Data Sync||
|Pessimistic Data Sync||
You will choose one over the other depending on your tolerance for the “disadvantages” and your preference is for the “advantages” of any method. However, and this is the focus of this blog post, one of these three is really not workable in the long run: Optimistic Data Synchronization.
The reason for my enmity for this approach is simple: this approach uses an underlying assumption that is poorly considered. That assumption is that it is fairly easy for two systems to stay in sync simply by keeping each other abreast of all of the events that have occurred in either one.
The problem with this assumption is that it is NOT easy for two systems to stay in sync. If the two systems don’t share an IDENTICAL data model, then each system has to interpret the messages from the other. The rules of that interpretation have to be coded in each system, and that code must stay perfectly in sync. Plus, there can be no errors in interpretation, or variations in the way that a change propagates throughout the recipient’s system. There can be no variations in the event model between the systems. No bugs either. Sure…. if we can get to the point where no humans are involved in writing computer applications, then this might make sense.
Not long ago, I used to think that Optimistic data sync was a good idea, and that SOA developers should assume that their data exists outside their systems. Heck, I used to think that call and response was a good idea too. Both are bad in practice, with Optimistic sync being by far the worst. There are just too many common scenarios (like one system going down for a period of time, and coming back up after missing some messages) that drives down the overall integrity of data managed in this way.
While I’d go so far as to say that Pessimistic and Call-and-Response are reasonable patterns for a SOA architect to select, the optimistic data sync method should be considered an anti-pattern, and avoided as much as humanly possible.
16 thoughts on “SOA Optimistic Data Synchronization considered harmful”
Hi Nick, I can´t argue about the data model issue, but I guess you could use a integration layer to buffer transactions, i.e. a message based middleware software capable of queuing transactions.
My discussion assumes the existence of the layer you describe. The pattern I’m criticizing, "SOA Optimistic Data Sync," is an antipattern for how that layer is used.
From your example it’s not clear what "customer transactions" means so it’s difficult to ask pointed questions about what data is being mastered. Also I don’t understand how the amount of "correctness" is managed differently between optimistic and pessimistic options. Any system with components that attempt to multi-master data is doomed to fail, regardless of its scalability/infrastructure.
More detail would be helpful in understanding your statement about keeping the rules of interpretation in sync across systems. e.g. Only one component in the system is the master of determining if a customer is "preferred" (perhaps the CRM component). Once a customer achieves/loses that status based on criteria defined by the master, that status is "dropped on the bus" and picked up by other components in the system. How a customer becomes preferred does not need to be replicated or understood by any other component. The marketing component will simply apply its logic to that change in status (send the customer a mailing), the pick/pack/ship component may process preferred customer orders ahead of non-preferred customer orders. These components remain unaffected if the criteria for becoming preferred changes and their data model for storing that information may be completely different from each other.
I know that highly scalable and highly reliable can be costly but durable MSMQ (which is not costly) can meet a lot of needs.
1) There are two versions of "optimistic data sync":
1.A) Published event contains <u>changes only</u> and another system interprets these changes.
1.B) Published event contains <u>changed entity with relevant context</u> (e.g. whole instance of Customer) and another system updates whole entity (not only changed data).
In (1.B) correctly processed event corrects previous errors. So "Data gradually gets out of sync, with no recourse to get it right" is not true for this case.
2) I expect you don’t assume the entity is mastered/changed in both systems (there is probably no reason to update entity in BellyFat). I always try to achieve (2.A) or (2.B):
2.A) Every entity (or set of attributes) is mastered in one system only.
2.B) Entity is mastered in the different phases of his lifecycle in one system only.
2.C) If none of previous is true, conflict resolution mechanism shall be defined.
Actually your "refresh mechanism" is one of conflict resolution algorithms, but I would try to avoid it as much as possible! In your example I would prefer (2.A) – to master data in Adipose. If BellyFat want to change data (for whatever reason) it would need to ask Adipose to change data and this change would be propagated back to BellyFat.
Have you been been catching up on Doctor Who?
ROTFL, Loraine! I love that episode! I wasn’t thinking about it when I wrote the post, though.
My wife is a personal trainer, and I spent the weekend working on marketing materials for her business. I guess weight loss was on my mind.
I am wondering how and if master data management fits in here, since – from what I read – you can use all three for handling data with MDM. Is this basically what you’re talking about – but without an actual MDM solution sitting in the middle?
My preferred SOA stack is one that includes an Enterprise Information Bus (EIB) that is a mechanism for handling the movement of refresh data between systems. An MDM solution can be a large part of the EIB.
Note that MDM systems are not source data systems. They are used to manage the information that is kept in source systems.
I reject the notion that your mechanism 1.B is effective at preventing data from getting out of sync. It is an Excellent way to reduce the "drift" but is not 100% effective unless the data models under the two applications happen to be 100% aligned. Yea… that happens a lot.
I would agree with you that your Scenario #2 is solid advice. The scenario I outlined leaves the possibility that both systems are updated at the same time. Which leaves 2.C as a common, if not always preferable, situation.
It is interesting that you indicate that data refresh from the source system is one conflict resolution mechanism but not preferred. You didn’t indicate what other conflict resolution mechanisms you would use if you could not achieve scenario 2.A.
Look, it might be a good idea to have all of the data about a customer mastered in one system, but if you have a list of "point" systems, none of which have a data model wide enough to actually capture all of the necessary attributes, then you have exactly two options: make it work, or change one of the systems to cover all attributes.
This blog post is about the "make it work" path. If you have the time, money, and buy-in to identify all the entities, and change systems to cover all attributes from each one, then go for it. Until you are done, you will need a pragmatic approach. Perhaps this one will help.
Would separate patterns for master data and transaction data make sense? I come from the data warehouse space, we have a degree of tolerance for drift in transaction details, but clean master data is critical. We run into a lot more scalability challenges, and I can’t afford to do periodic full refreshes as specified by the pessimistic sync scenario.
If you have a transactional system (Adipose) that updates data, and at the same time, informs another transactional system (Bellyfat) to update that data. What solution do you offer for Adipose to remain aligned with BellyFat?
Neither system is a BI system.
It is entirely rational that NEITHER system is the master, and that an MDM solution gathers the data from both, works out the conflicts, creates a master data set, and then REFRESHES both transactional system with system-specific views of the master data.
That is simply a variation on "Pessimistic Data Sync."
0) Thanks for answer.
1) [One-way integration] I believe it is possible to achieve "guaranteed processing" – that every message is processed. The following invariant shall hold:
– message was successfully processed
– or message is on the way (in a message queue, integration process is suspended or message is being processed by a component…)
– or message is in error log and will be processed by administrator manually (we need to set up operational procedures; it is inherent part of delivery of every integration solution)
Ok, there still can be (and will be) errors and we will need to identify them, resend events and reprocess them in target system. We do it sometimes, but I do not see reason to deliver ETL component as part of our solution (as I do not deliver solution for other types of bugs).
2) [Two-way integration]
2.A) Yes, I understand that not the whole entity can be mastered in one system. Most of time SUBSET of attributes can be mastered in one system. But of course there can be situations, it is not possible…
2.B) Regarding to conflict resolution – As I believe in (1), for me it is the same problem as for lazy replication of databases. There are algorithms, which depend on nature of data and changes (e.g. last change wins; if there are differential changes, they can be summed…).
Thank you for your response. For one-way integration without bulk data refresh, you seem to imply that the problem is in the delivery of the messages. That is not, and was never, my point. You can be 100% certain that every message is delivered, yet I can describe scenarios where the data still gets out of sync.
Errors caused by out-of-sync data happen commonly in systems of this sort, and the guys in "operations" who maintain the systems are used to fixing these problems. Sometimes it is with queries to identify and fix the "broken" data one record at a time. Other times, it is with ETL feeds to refresh it.
Regardless of how they have to fix it, the work has to be done. The fact that the operations guys are so good at it… doesn’t give us, the system designers, the right to ignore it as a use case, or to disregard the possibility in the design. The fact that you don’t delivery a solution to this problem doesn’t tell me that the problem is absent.
Your two-way integration points seem connected to the same misunderstanding: that you can fix the "drift" through techniques at the transaction level. That is true in one condition: that the data models behind the two systems are well aligned. If they are not, then both systems can interpret 100% of all incoming transactions, and the systems can still get out of sync, because they are "hearing the same words, yet taking different actions."
Regards, — N —
I think I understand you.
1) Regarding to "hearing the same words, yet taking different actions.":
It is strange to me, that you want to solve "semantic interoperability" problems by different INTEGRATION MECHANISM.
Whether you use batch integration mechanism (ETL) or event-driven integration mechanism, you have to map one model to another and implement transformation rules… If there is problem with "yet taking different actions", ETL shall be probably fixed by development team and tested too.
There is no magic use-case "Fix integration problem", which can solve integration problems by clicking on one button.
2) I understand, that the problems can be quite common and that there can be interest of development and operation teams to have standard mechanism (ETL), which would help to fix this kind of problems.
But maintenance of second integration mechanism can be quite expensive. Are you implementing it in practice or is it "theoretical concept" (for purpose of discussion) at the moment?
Thanks a lot.
I think we are actually saying the same things. Data model problems (semantic interoperability) can be addressed, but not solved, through integration workarounds. It is not preferable to address problems this way.
That said, IT sometimes doesn’t get a say in solving that problem. The problem may stem from political concerns, or industry concerns, or even choices made in the executive suite… sometimes IT cannot fix the data models to align them because the business insists on them being misaligned. Other times, it is just too much of a capital expense to fix the problem, and allowing an ongoing charge for data repair is the only option that the business can (is willing to) afford.
While this is not preferable, it is so common that the overwhelming majority of large or medium sized businesses will have some area where they have multiple software systems with incompatible data models. Overwhelming majority. Cannot be ignored.
So if we are going to talk about SOA stacks, where SOA is often used as a means of integration, we have to talk about SOA in the context of all means of integration… including the integration mechanisms that are not SOA.
The problem that I’m highlighting comes when a SOA architect chooses not to pay attention to these other means by assuming that transactional fidelity can solve a semantic interoperability problem. You and I agree that you cannot solve such a data issue with SOA alone. Yet many disagree.
Many folks, when designing the shared architecture of IT, will choose to ignore the fact that semantic interoperability issues exist at all. Some I’ve read about. Others I’ve met personally. No amount of convincing seems to work.
Am I implementing it? Of Course. (You probably are as well, but your operations guys may not be telling you about it). Nearly everyone has an ETL integration mechanism to fall back on, that is called into play on occasion. Sometimes, it is expensive. Each time it is necessary.
Yes, ETL will likely require some logic as well. A good mechanism uses an MDM approach, where the data extracted from one system is moved into a canonical structure (first transformation) and then maintained for the enterprise. It is then transformed into the target structure (second transformation) before being fed into the recipient system. Good MDM systems reduce the cost of defining, managing, and performing these transformations. That reduces the cost a bit.
That said, an ETL mechanism has the advantage of defining the complete list. Unlike transactional mechanisms, there are no hidden gaps. You can effectively replace EVERY RECORD. (I wouldn’t start by deleting every record! There are better ways. But I would start by marking every record as "old", performing the data load, and seeing which "old" records remain when you are done, to flag them for possible deletion.
I hope this helps clarify my position.
1) Thanks again for your answer. It is interesting to read about work of other people.
2) We really do not implement ETL solution and our operation guys do not have any such tool… We had various incidents – e.g. we had to return with data in one system two days back and to reintegrate old data… We always resend and reprocess messages or we implement simple ad-hoc updates (we update one or two fields, not whole entities).