When it comes to data quality, I fervently believe that it is destined for widespread adoption. As a concept data quality has been around for a while, but until now it’s only truly been appreciated by a group of aficionados. But just like taco trucks, the HBO show “In Treatment,” video on demand, and Adam Lambert, data quality’s best days are actually ahead of it.
Part of the reason data quality hasn’t yet its stride is because it remains a difficult sell. Those of us in the business intelligence and data integration communities understand that accurate and meaningful data is a business issue. And well-intentioned though they may be, IT people have gone about making the pitch the wrong way.
We—vendors, consultants, and practitioners in the IT community…
When it comes to data quality, I fervently believe that it
is destined for widespread adoption. As a concept data quality has been around
for a while, but until now it’s only truly been appreciated by a group of aficionados. But just like taco trucks, the HBO show “In
Treatment,” video on demand, and Adam Lambert, data quality’s best days are actually
ahead of it.
Part of the reason data quality hasn’t yet its stride is
because it remains a difficult sell. Those of us in the business intelligence
and data integration communities understand that accurate and meaningful data
is a business issue. And well-intentioned though they may be, IT people have
gone about making the pitch the wrong way.
We—vendors, consultants,
and practitioners in the IT community—blather on about data quality being a business
issue and requiring a business case and a repeatable set of processes but at
the end of the day automation remains the center of most data quality discussions.
As we try to explain the ROI of name and address correction, deterministic matching,
multi-source data profiling, and the pros and cons of the cloud, business
executives are thinking two things:
1: “Jeezus I’m
bored.”
2. “I wonder
how we would we start something like this? Where would we begin?”
In fact the topic of scope is a huge gaping hole in the data
quality conversation. As I work with clients on setting up data governance, we
often use the bad reputation of corporate data as its pretext. We always,
always talk about the boundaries of the initial data quality effort. Unless you
can circumscribe the scope of data quality, you can’t quantify its value.
In our experience, there are 5 levels of data quality
delivery that can quickly establish not only the scope of an initial data
quality effort, but also the actual duties and resources involved in the
initial project:
By specifying the initial scope of the data to be corrected we’re
establishing the boundaries of the effort itself. We’re also more likely to be
solving a real-life problem. Thus we make the initial win much more impactful,
thus securing stakeholder participation. Moreover where we start our data
quality effort is not necessarily where we’ll finish, so we can ensure an
incremental approach to setting up the program and its roles.
Business executives and users can consume a well-scoped
problem, especially if it makes their jobs easy or propels progress. And if we
solve it in a way that benefits the business—eliminating risk, ensuring economies
of scale, and driving revenues—we might even get budget for a data quality
tool!