One of the debates one hears when designing a data warehouse is that it should be normalized. Specifically, they say to use a third normal form (3NF) or a dimensional model.
One of the debates one hears when designing a data warehouse is that it should be normalized. Specifically, they say to use a third normal form (3NF) or a dimensional model.
This debate is often an ideological battle, where people cite Inmon or Kimble to justify their position. At this level, the debate is about theory rather than the business, data or analytical needs of enterprise business people. But before people build a data warehouse, they must understand those needs, as well as the industry best practices that will help fulfill them.
The biggest reason why IT groups have this debate is because their view of dimensional data modeling is too simplistic. IT developers generally view dimensional models as fact and dimension tables placed in either a star or snowflake schema. IT understands how to implement the basic concepts such as surrogate keys and slowly changing dimensions (SCD), but they hardly, if ever, use much of the advanced (also known as hybrid) design constructs.
They see the advanced concepts, such as rapidly changing, casual, hot swappable, heterogeneous or junk dimensions; how to implement hierarchies; bridge and outrigger tables; and when to use the various categories of fact tables, as esoteric.
(Admit it, it was tough just reading this sentence without thinking it was time to check your Facebook page!)
So why is it so tough to grasp these advanced concepts? A big part of the problem is that they are generally explained in an academic context. They’re not being connected to the real-world use cases where they should be used. Thus, they become geek-speak and are ignored.
Complicated as they may sound, the advanced dimensional design approaches have each been formulated based on real-world business and data requirements that occur across all enterprises. Rather than esoteric, these concepts are based on a pragmatic approach to implementing successful data warehouse and BI solutions.
Until IT understands the depth and practicality of advanced dimensional modeling, the decision whether to implement a normalized versus a (simplistic) dimensional model is a false debate. IT either builds an overly complex 3NF data warehouse that quickly gets overwhelming, or they build an overly simplified dimensional model that needs to be continually overhauled to support the inevitable expanding and changing business requirements. In either case, the business is underserved when it comes to getting the information they need, and the costs of BI keeps rising without the expected business ROI.
If companies understand and implement advanced dimensional models, then they can leverage the best practices that have been developed through years of real-world experience.