One troubling image has started to emerge for me. Perhaps my perspective is too filtered by my own encounters, and I’m hoping that it is. It’s an image of a future without consistent ETL tools… and it’s truly unsettling.
One of the advantages of being a consultant is to spend time in many different organizations, talking to many different people. I don’t get the filtered comfortable view that internal folks get. If someone is paying for me to be there, a problem has reared it’s ugly head and they need help. So I mostly look at problems, not smooth running situations every day.
But one thing has started to emerge. Many folks are not promoting consistent ETL tools anymore. I get it… the use of ETL has declined as more service-base integration has been occurring. Mostly ETL tools are used to synchronize two different databases in an enterprise, either through rapid capture of a change or through periodic batch updates moving from one system to another.
We’d all like to pretend that a services-based event driven system can keep two databases in sync (even if one of them is not a database, or is a nosql database). But the reality is: event driven systems occasionally drop an event. And when they do, one system will “know” a fact while another will not.
ETL tools are the only IT tools that can fix that situation. Whether open source, like Talend, or widely used, like SQL Server Integration Services, or wildly full featured, like Informatica, an ETL Suite can be invaluable in ensuring enterprise level Master Data Management and consistent integration.
What’s more important, honestly, is not that ETL tools are used, but that they are used consistently. From an architect’s standpoint, there needs to be an agreed-upon standard for integration that the enterprise can support and can use to understand the provenance of every data element and what the value actually means.
Yet, in some organizations, it is now becoming more common to hear people say “we will just call an API. No need for a standard integration tool. No need for ETL at all.”
Let me stand up and disagree with that statement. I am quite glad to see message bus integration take the upper hand in organizations, but I also recognize the intrinsic value of having a standard approach to the “other bus”, the data bus.
When we integrate information between systems using an API layer or an event driven message layer, that is the “message bus”. Whether point to point or an async message handler, the message is sent in near real time, and the system is synchronized one row at a time, in near real time. Systems stay in sync quickly and the overwhelming majority of systems are quite capable of handling that.
But messages will be lost. It is inevitable. Using the message bus as the ONLY mechanism for system integration is a disaster waiting to happen. You need a fail-safe to ensure that any integrated set of systems actually share the same information, especially if it is master data like the list of customers, the list of products, or even the list of countries and regions in which your organization sells things. That shared platform is the data bus, and the data bus is managed by ETL tools.
So my friends, please pay attention. Look around to see if your organization’s embrace of the message bus has allowed them to forget about the data bus. The best indicator that they have: the lack of a standard for ETL.