In my discussions with clients, prospects, students and networking with folks at seminars I am always asked about my opinion or recommendations on data integration and ETL (extract, transform and load) products.
In my discussions with clients, prospects, students and networking with folks at seminars I am always asked about my opinion or recommendations on data integration and ETL (extract, transform and load) products. People always like to talk products and much of industry literature is centered on tools.
I’m happy to discuss products, but every once in a while someone asks me a more insightful question, which is what happened this week. That person asked what the main shortcomings or stumbling blocks are that companies encounter when implementing data integration.
Great question. I discuss this when I am working with clients and teaching courses, but hardly anyone asks me that and directs the discussion in that direction.
My answer is simple: it’s not the tool, but how you use it that determines success. Although you do have to know the mechanics of the tool that is not the critical success factor. What really matters is the mechanics of data integration.
Many people don’t understand data integration processes and the frameworks products provide to implement those processes. And it’s not just data integration newbies that have this problem; it’s also experienced veterans.
Most data integration architects, designers and developers started ETL by writing SQL scripts or manually coding using something like Java with JDBC. Then they try to replicate what they did in the manual code into the data integration processes. This is probably the worst way to use a data integration product! You likely get little benefit from the framework, processing is not optimized (maybe even terrible) and worse, the developer gets frustrated because he feels he could have coded it faster.
Welcome to the world of frustrated data integration processes, where people either assume these products are not useful or that the particular product they used must not be very good.
Almost all data integration products provide data imports/exports; data and workflows; data transformations; error handling; monitoring; performance tuning; and many processes that have evolved as best practices such as slowly changing dimensions (SCD), change data capture (SCD), and hierarchy management. All of these pre-built capabilities mean that data integration development does not have to reinvent the wheel, but can leverage industry best practices to develop world-class integration. But instead many data integration developers are spending their time creating the equivalent of manually coded import, extract and transforms without ever having time to get the best practices that would best serve their business.
Any successful, productive, robust data integration effort needs people who understand the necessary processes and can implement best practices. Getting the tool and having the people who know how to use the tool is only the beginning. You will get nowhere fast until you make sure you have people who understand data integration processes.