This is a screen capture of the results of last week’s unscientific data quality poll where it was noted that in many organizations a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single system of record containing fully integrated and historical data. Although the rallying cry and promise of the data warehouse has long been that it will serve as the source for most of the ent
This is a screen capture of the results of last week’s unscientific data quality poll where it was noted that in many organizations a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single system of record containing fully integrated and historical data. Although the rallying cry and promise of the data warehouse has long been that it will serve as the source for most of the enterprise’s reporting and decision support needs, many simply get ignored by the organization, which continues to rely on its data silos and spreadsheets for reporting and decision making.
Based on my personal experience, the most common reason is that these big boxes of data are often built with little focus on the quality of the data being delivered. However, since that’s just my opinion, I launched the poll and invited your comments.
Commendable Comments
Stephen Putman commented that data warehousing “projects are usually so large that if you approach them in a big-bang, OLTP management fashion, the foundational requirements of the thing change between inception and delivery.”
“I’ve seen very few data warehouses live up to the dream,” Dylan Jones commented. “I’ve always found that silos still persisted after a warehouse introduction because the turnaround on adding new dimensions and reports to the warehouse/mart meant that the business users simply had no option. I think data quality obviously plays a part. The business side only need to be burnt once or twice before they lose faith. That said, a data warehouse is one of the best enablers of data quality motivation, so without them a lot of projects simply wouldn’t get off the ground.”
“I just voted Outhouse too,” commented Paul Drenth, “because I agree with Dylan that the business side keeps using other systems out of disappointment in the trustworthiness of the data warehouse. I agree that bad data quality plays a role in that, but more often it’s also a lack of discipline in the organization which causes a downward spiral of missing information, and thus deciding to keep other information in a separate or local system. So I think usability of data warehouse systems still needs to be improved significantly, also by adding invisible or automatic data quality assurance, the business might gain more trust.”
“Great point Paul, useful addition,” Dylan responded. “I think discipline is a really important aspect, this ties in with change management. A lot of business people simply don’t see the sense of urgency for moving their reports to a warehouse so lack the discipline to follow the procedures. Or we make the procedures too inflexible. On one site I noticed that whenever the business wanted to add a new dimension or category it would take a 2-3 week turnaround to sign off. For a financial services company this was a killer because they had simply been used to dragging another column into their Excel spreadsheets, instantly getting the data they needed. If we’re getting into information quality for a second, then the dimension of presentation quality and accessibility become far more important than things like accuracy and completeness. Sure a warehouse may be able to show you data going back 15 years and cross validates results with surrogate sources to confirm accuracy, but if the business can’t get it in a format they need, then it’s all irrelevant.”
“I voted Data Warehouse,” commented Jarrett Goldfedder, “but this is marked with an asterisk. I would say that 99% of the time, a data warehouse becomes an outhouse, crammed with data that serves no purpose. I think terminology is important here, though. In my previous organization, we called the Data Warehouse the graveyard and the people who did the analytics were the morticians. And actually, that’s not too much of a stretch considering our job was to do CSI-type investigations and autopsies on records that didn’t fit with the upstream information. This did not happen often, but when it did, we were quite grateful for having historical records maintained. IMHO, if the records can trace back to the existing data and will save the organization money in the long-run, then the warehouse has served its purpose.”
“I’m having a difficult time deciding,” Corinna Martinez commented, “since most of the ones I have seen are high quality data, but not enough of it and therefore are considered Data Outhouses. You may want to include some variation in your survey that covers good data but not enough; and bad data but lots to shift through in order to find something.”
“I too have voted Outhouse,” Simon Daniels commented, “and have also seen beautifully designed, PhD-worthy data warehouse implementations that are fundamentally of no practical use. Part of the reason for this I think, particularly from a marketing point-of-view, which is my angle, is that how the data will be used is not sufficiently thought through. In seeking to create marketing selections, segmentation and analytics, how will the insight locked-up in the warehouse be accessed within the context of campaign execution and subsequent response analysis? Often sitting in splendid isolation, the data warehouse doesn’t offer the accessibility needed in day-to-day activities.”
Thanks to everyone who voted and special thanks to everyone who commented. As always, your feedback is greatly appreciated.
Can MDM and Data Governance save the Data Warehouse?
During last week’s Informatica MDM Tweet Jam, Dan Power explained that master data management (MDM) can deliver to the business “a golden copy of the data that they can trust” and I remarked how companies expected that from their data warehouse.
“Most companies had unrealistic expectations from data warehouses,” Power responded, “which ended up being expensive, read-only, and updated infrequently. MDM gives them the capability to modify the data, publish to a data warehouse, and manage complex hierarchies. I think MDM offers more flexibility than the typical data warehouse. That’s why business intelligence (BI) on top of MDM (or more likely, BI on top of a data warehouse that draws data from MDM) is so popular.”
As a follow-up question, I asked if MDM should be viewed as a complement or a replacement for the data warehouse. “Definitely a complement,” Power responded. “MDM fills a void in the middle between transactional systems and the data warehouse, and does things that neither can do to data.”
In his recent blog post How to Keep the Enterprise Data Warehouse Relevant, Winston Chen explains that the data quality deficiencies of most data warehouses could be aided by MDM and data governance, which “can define and enforce data policies for quality across the data landscape.” Chen believes that the data warehouse “is in a great position to be the poster child for data governance, and in doing so, it can keep its status as the center of gravity for all things data in an enterprise.”
I agree with Power that MDM can complement the data warehouse, and I agree with Chen that data governance can make the data warehouse (as well as many other things) better. So perhaps MDM and data governance can save the data warehouse.
However, I must admit that I remain somewhat skeptical. The same challenges that have caused most data warehouses to become data outhouses are also fundamental threats to the success of MDM and data governance.
Thinking outside the house
Just like real outhouses were eventually obsolesced by indoor plumbing, I wonder if data outhouses will eventually be obsolesced, perhaps ironically by emerging trends of outdoor plumbing, i.e., open source, cloud computing, and software as a service (SaaS).
Many industry analysts are also advocating the evolution of data as a service (DaaS), where data is taken out of all of its houses, meaning that the answer to my poll question might be neither data warehouse nor data outhouse.
Although none of these trends obviate the need for data quality nor alleviate the other significant challenges mentioned above, perhaps when it comes to data, we need to start thinking outside the house.