When you pass a message from one system to another, you have to decide: do I want the message to pass quickly, or do I want to be certain that the message gets there? But what does that do to decoupling? Here’s a specific case. Take a look at the decisions made and tell me: would you make the same choices?
We have a new system coming in to existence that will create a business document using a new web interface. The legacy system that sits underneath also creates the same business document in a less-than-appealing manner. We put in the new system so that we can move people from the old system to the new one. Once everyone moves over, we can consider decommissioning the old interface. Eventually, we will replace the back end as well.
To see an image of the roadmap in another window, click here.
Of course, the devil is in the details. We are in step 1: adding the new interface to the legacy system.
For now, the legacy datastore has the domain data that the interface needs. There are two points here: small domains (like list of U.S. States or Document type codes) and large domains (like lists of Customers or Products). In the latter category, we have one table in the legacy db that is quite large, so when the user needs a record from this table, we create a query message from the new system to the old to get search results. Note that in the future, this data will probably come from the performance cache or directly from an upstream system. It does not “belong” to the legacy app.
The user is waiting on this query to return. This is important.
The question is this: we are trying to keep the two systems as decoupled as possible. In that vein, all other transactions between the systems happen through a Biztalk interface. We can handle orchestration, indirection, mapping, and isolation using Biztalk.
However, for this looking up data in the large table, we want to get the search results data quickly. We don’t need to translate the fields. We need message speed, but not message reliability. If the user cannot search, then the process of creating the business document is stopped anyway. (The process is similar to creating an order online… if you can’t see the product catalog, it’s hard to create the order). The data source is highly reliable already, so there is no need to improve that in the messaging system.
So I have to consider: do we make a direct dependency between the new system and the old one by adding a direct call, completely circumventing Biztalk, or do we keep ALL of our connection running through Biztalk so that we can maintain all relationships in one place?
An image that illustrates this choice is here. The choice is whether or not to put in the direct dependency, illustrated as a blue arrow.
The development team chose to go ahead and add this dependency.
For Reliable EAI calls, the transactions run through Biztalk. For direct queries, the developers went with a seperate direct call to get the search results. An additional direct call was added to get the rest of the domain data.
So, now the two systems are connected through two pathsways. One dependency runs directly to get domain data from legacy to new, while the other runs through a messaging interface.
The reasons for this decision are obscured from me at the moment, but I believe that, when I ask, I will hear: we wanted performance and it is slower to run through BTS (somewhat true).
1) Approve the design and don’t worry about it. (always an option)
2) Ask that the performance cache be more formalized in future releases, so that I’m sure that the dependencies are centrally managed and that the cache isn’t treated as a new data master. This may add complexity, and a constraint or two, but probably won’t affect performance.
3) Kill the additional dependency and require that all data queries run through the Biztalk engine.
(I’m leaning towards #2.)
What do you think?
4 thoughts on “Should a performance cache query run through your EAI hub?”
Nick – I’m leaning toward #2 or #3 here because once the precedent for and exception like this is established, it will come up again and again. In time, it’s no longer a precedent, but a de facto architectural standard.
Has there been any strawman performance testing or simulation of this? Have the data volumetrics involved been measured or at least estimated? If speed is an issue there are a number of ways of optimizing the framework such that direct DB calls don’t have to be made. I’m thinking more along the lines of data model and or DB server optimizations that could be applied while maintaining the integrity of the SOA structure – kind of a psuedo-cache, if you will.
Option 1 is a band-aid that will eventually turn into a prosthetic…take a look at the other technical alternatives first.
Nick-is the performance concern related to the query time on the legacy DB or is it the size of the payload being returned across your message bus? If it is the former, you will most likely not gain a noticeable performance gain because the bottleneck is on the source DB. If the payload size is the issue, then creating a point to point will optimize performance.
As the previous reply stated, I would stick with #2 unless the expected performance is beyond the user’s expectations.
The dev team is not all that familiar with SOA, and for them, anything having to do with Biztalk might just as well be a "Houdini call" (magic, even though they know better).
I think the concern is that any call to an EAI bus has to be ‘slow’ because it’s a call to an EAI bus, and not directly to the db.
Aside from the fact that adding the call was a break from the design, I don’t think it was wrong, per se. It means that the design needs to take the performance cache into account in Step 1, and not to wait until Step 3 to formalize it.
The architect is not always right. (I wish I were THAT good). I don’t agree with the reasons, really, because I don’t see it as a performance issue. That said, there are clear advantages to having a performance cache (which is why it was on the roadmap in the first place), and there is no particular reason why calls to the performance cache need to run through the EAI structure.
All in all, I’m not able to come up with a good reason to force the team to option #3 other than purity of design, which is a pretty thin reason.
Time to update the design spec. I’m going with option #2.
Ah, the age old "everything going through the bus must be slower, and we need spectacular performance, therefore we must go direct even though it defies all the benefits of the bus" argument (or however you want to word the same argument that has been coming out of project teams since architects tried to impose any sort of order.
1 – What truly are the performance requirements? I know doing a WS through our bus and a WS direct are less than 1/2 second difference, and that’s on a bad day. Adding 150 – 200 milliseconds to a transaction really isn’t that much for most corporate users.
2 – What are the benefits your bus can provide? Tracking, monitoring are two that we provide that come to mind right away
3 – Standardization, or purity of design, isn’t really a strong reason as you mention, but, it is a reason and one that should be considered for future support reasons.
The performance is the cop-out that almost every project team uses. We ask them to justify it with SLA’s before they can use that argument. Certainly the documented performance cache will at least alleviate some of your concerns, and you have to decide how much you’re willing to fight over what sometimes can be a trivial point, but, that can’t be the mindset that we approach this with.
That said, as you said, the architect is not always right, and you’re not going to win them all. I just hate when people use a cop-out that they haven’t really documented yet. What we’re trying, right now actually, is to complete and document some performance tests that show the actual impact of the bus on a variety of different call types so that we at least have the facts to work from