(This is part of our ongoing Series of Unfortunate Data Warehousing and Business Intelligence Events. Click for the complete series, so far.)
A fundamental flaw of many business intelligence solutions is recreating what the company is already using for reporting and analysis. This takes one of two paths:
1) The data warehouse is built using essentially the source systems’ data model. It may be “cleaned up” with new names and use only a subset of the source data, but it is really just a retread of what you already have.
It does shift your reporting from the source systems to a DW, but you have not taken advantage of the advanced dimensional modeling techniques that have grown to provide superior analytic performance. An entity-relationship (ER) model or third normal form (3NF) is indeed best practice for transactional systems, but not for business intelligence or data integration. IT knows 3NF and hence figures that is what they should do; many experienced practitioners starting off using 3NF and they continue to do so.
The cost is longer development times and more labor-intensive maintenance. It also performs slower than best practice design, so …
(This is part of our ongoing Series of Unfortunate Data Warehousing and Business Intelligence Events. Click for the complete series, so far.)
A fundamental flaw of many business intelligence solutions is recreating what the company is already using for reporting and analysis. This takes one of two paths:
1) The data warehouse is built using essentially the source systems’ data model. It may be “cleaned up” with new names and use only a subset of the source data, but it is really just a retread of what you already have.
It does shift your reporting from the source systems to a DW, but you have not taken advantage of the advanced dimensional modeling techniques that have grown to provide superior analytic performance. An entity-relationship (ER) model or third normal form (3NF) is indeed best practice for transactional systems, but not for business intelligence or data integration. IT knows 3NF and hence figures that is what they should do; many experienced practitioners starting off using 3NF and they continue to do so.
The cost is longer development times and more labor-intensive maintenance. It also performs slower than best practice design, so many companies compensate by buying more infrastructure such as CPUs, memory, storage and network bandwidth. If you sell or resell hardware then using this design is fine, but for the consumers of BI solutions you should try another way.
2) The other end of the spectrum from 3NF is recreating your current reporting solutions, often data shadow systems or spreadmarts, that basically flatten out the data. It is easy to see why people recreate the spreadsheets the business people are using for reporting, but it leads to inflexible reports that require more and more reports to be built every time the business changes or expands their reporting requirements.
The fundamental concept behind dimensional modeling and OLAP (online analytical processing) design was to provide business people with the flexibility in their reporting and analysis. This is how a company can enable business self-service reporting rather than have a large group of BI developers designing, building and maintaining dozens or hundreds of custom reports.
Just as with 3NF the “flat world” approach to data mart design results in a much higher TCO and the huge queue of report development one sees at many BI implementations. Most assume that queue and costs come with the territory, but it does not have to be that way.
I will follow up with more unfortunate events I have observed. Feel free to e-mail with the unfortunate events you have seen.
Link to original post