I’m a data modeler, so I enjoyed Jonathon Geiger’s recent article entitled “Why Does Data Modeling Take So Long”. But why does he say it like it is a bad thing?
I’m a data modeler, so I enjoyed Jonathon Geiger’s recent article entitled “Why Does Data Modeling Take So Long”. But why does he say it like it is a bad thing?
Mr. Geiger’s bottom line is exactly right: “Most of the time spent developing data models is consumed developing or clarifying the requirements and business rules and ensuring that the data structure can be populated by the existing data sources.” On the projects he describes, no one took time before modeling to determine available data sources and identify business entities of interest, relationships among them, and attributes that describe them before database design started, so the data modeler had to do it.
Taking the second point first, we often think modeling takes a long time because we don’t recognize the need for conceptual data modeling in requirements. I’ve written that “using data modeling techniques in requirements analysis reduces errors by improving requirements completeness, consistency, and communication, and provides unique continuity between analysis and design.” The International Institute of Business Analysts (IIBA) must agree: the Business Analysis Body of Knowledge (BABOK) lists data modeling among the tools available to requirements analysts. Its purpose, according to the BABOK, is “to describe the concepts relevant to a domain, the relationships between those concepts, and information associated with them.”
For systems like data marts and warehouses that pull from existing source databases, investigation of current sources is a prerequisite of modeling. Typically, some required data will not exist in source systems, and source data structures often contain inconsistencies and idiosyncrasies that modelers must understand before designing the database. Mr. Geiger cites null values in a mandatory source field, a common problem in my experience.
However, there are two reasons this is good news rather than bad.
First, if data modelers take time to make up for missing analysis they can save the project. There is simply no way to design a satisfactory database without understanding business entities, relationships, and attributes, and the data that will feed the database. By taking time to figure these things out modelers not only design the right database but also positively influence the design of the application that uses the database. Modeling schedule overruns can be time well spent.
Second, I’ve seen managers go through the dynamic that Mr. Geiger describes and learn to start data modeling earlier. These project planners learn from their experience and bring in the data folks early, front-loading their work in the requirements process. I’ve found in those cases that data modeling substantially improves the quality of requirements, and as a result the chances of a successful project.
One final note: all this is still the case on an Agile BI effort. Requirements may be less structured, and iteration scope is of course much smaller, but sources must be profiled and business entities, relationships, and attributes understood before successful database design.