This is a continuation of an earlier post that discussed the problems of hand-coding using ETL tools.
What Went Wrong?
There are two aspects of effectively leveraging an ETL tool. First is learning the tool’s mechanics, for example, by taking the tool vendors’ training either in a class or through their on-line tutorials. Most IT people have no problem learning a tool’s syntax. Since they most likely already know SQL, they learn the tool very quickly.
But the second aspect actually involves understanding ETL processes. This includes knowing the data-integration processes needed to gather, conform, cleanse and transform; understanding not only what is dimensional modeling but why and how do you deploy it; being able to implement slowly changing dimensions (SCD) and change data capture (CDC); understanding the data demands of business intelligence; and being able to implement error handling and conditional processing.
Without understanding the why of ETL processing, IT developers either quickly become disillusioned with ETL tools or simply under utilize them. Typically these ETL implementations merely result in the ETL tools executing SQL scripts or stored procedures, for example …
This is a continuation of an earlier post that discussed the problems of hand-coding using ETL tools.
What Went Wrong?
There are two aspects of effectively leveraging an ETL tool. First is learning the tool’s mechanics, for example, by taking the tool vendors’ training either in a class or through their on-line tutorials. Most IT people have no problem learning a tool’s syntax. Since they most likely already know SQL, they learn the tool very quickly.
But the second aspect actually involves understanding ETL processes. This includes knowing the data-integration processes needed to gather, conform, cleanse and transform; understanding not only what is dimensional modeling but why and how do you deploy it; being able to implement slowly changing dimensions (SCD) and change data capture (CDC); understanding the data demands of business intelligence; and being able to implement error handling and conditional processing.
Without understanding the why of ETL processing, IT developers either quickly become disillusioned with ETL tools or simply under utilize them. Typically these ETL implementations merely result in the ETL tools executing SQL scripts or stored procedures, for example, hand-coding.
These hand-coded processes within ETL tools are big trouble-makers. First, the tools have built-in transforms such as SCD and CDC which, if you don’t use, make you re-invent the wheel (code something you already bought). In doing so, you’re likely doing something inefficient at best and outright wrong at worst.
Second, ETL tools are built to be more efficient at extracting, transforming and loading data than SQL coders.
Third, the IT staff is not likely to code extensive error handling or audit routines that are pre-built in the ETL tools. This lessens productivity and responsiveness to issues in data quality.
Fourth, hand-coded processes are often not documented or, if they are initially, they’re not likely to be maintained.
Finally, each hand-coded operation is a custom job that each new developer has to learn, versus being able to bring in a developer who knows an ETL tool.
How to Avoid Repeating History’s Mistakes
You don’t know what you don’t know. It’s not that the IT staff wants to use these ETL tools either incorrectly or inappropriately, but they don’t know any better.
I’ll keep preaching that data-integration processes should be developed using ETL tools rather than hand coding. But what I have learned along the way is I also need to advocate that anyone using these tools learn not just about the tool but more importantly about ETL processing.
FYI: A good starting place is my articles on ETL. Check out my corporate library pointing to my articles, posts, webinars, podcasts and white papers.