Let’s imagine that the quality of the data you're using for your business decisions is problematic and improving it has become a high priority for your top management. What should you do and how should you set about doing it? I’m going to write a few blogs which I hope will provide pragmatic and practically orientated answers to these questions.
What success looks like
First, why should we care? Well, it's not rocket science, if the management and governance of data quality is improved within a company:
the company will be making decisions based on correct data (or at least, it knows when it's using unreliable data for its decision making).
the company will become more cost-effective in the gathering and communication of data quality information to its consumers and in the handling of data quality issues.
and finally, good data quality governance creates an environment for the continual improvement of important business information's quality.
How do we do it? The natural place to start looking for answers is by asking the question “Who are the data quality improvements for?”. Which people in the company really need improvements in data quality and what should these improvements look like?
The following three groups are the people who relate most to data quality issues:
these people are consumers of DQ information. Basically they want to know whether the data they're using for decision making is OK or not. (I use the very vague “OK” on purpose. Describing what's “OK” in data-quality is a rather complex topic and has many permutations. There's a nice blog discussing the issue by Peter Hora (Data quality and business metadata combined?) for those interested. But let's not get sidetracked into it today.)
The people responsible for the daily updates of DQ information:
This group is mainly made up of the people who are responsible for the daily operations of the BI systems (e.g. the "stewards" of the data, ETL's and reports etc.). To do their jobs properly, they need an efficient application tool that is capable of multiple tasks:
allowing the monitoring of the data quality of the objects that fall under their responsibility.
providing an effective means of communication with the business users about any data quality problems that are found as well as about finding their solutions.
and finally allowing the management of longer-term systematic improvements that prevent the problem's repetition (just a terminological note: in ITIL we would write about finding a solution to the DQ “incident" and then about managing the steps needed to remove the “problem").
The people responsible for the development
and change management processes of data, ETL's and reports: each company is a living organism whose data structures and reports are the subject of never-ending change. The people responsible for these changes (e.g. all the people related to the change management and development process) must have clear instructions what data quality information must be provided with new or changed data-objects. If these guidelines and rules are not strictly adhered to, the data quality information provided to business users will become unreliable or incomplete.
How to get there?
This week I want to give a brief overview of the barriers to Data Quality and the principles to achieving it.
Data-quality controls have usually been implemented in one form or another in every company. They can be implemented in many places (I call them checkpoints), typically in DWH and ETL's, but sometimes also in primary systems and other places. What are the problems emerging from these checkpoints? Well, keeping in mind our three main user groups the biggest problems facing these checkpoints are that:
they often aren't well coordinated - data stewards have a hard time overseeing what has been controlled, the results of the check, as well as when it took place.
their results often aren't available to all the relevant business users which means that trust in DQ and data falls.
they're often insufficiently documented - which again results in neither the business user nor the data steward being able to tell if the quality is good or bad.
they don't have a clear relationship to the company's business rules.
A company can get over these hurdles, by taking a number of initiatives:
Getting feedback from consumers: this is very important for data-quality projects. Often project teams focus on all kinds of automatic data-quality measurement systems and forget that the consumers of the data-quality information can provide very valuable feedback. Failure to set up effective communication channels between the consumers and the providers of the data-quality information not only results in complaining business users and an abundance of useless data-quality checks but also the failure to make use of a valuable and cheap resource.
Giving the DQ manager a strong mandate and powers. Combine that with assigning responsibility for the data quality of each data-object, ETL and report to the different individuals who deal with them and you're starting to create a network of checks you can trust.
An application tool that displays DQ check results and tracks DQ incidents.
I'm going to look in more detail at these three initiatives over the coming weeks but I'd like to finish with a quick note about "how" to take these steps:
No big bang: as in all other IT/BI projects, a big-bang style revolution in data quality handling most often leads to budget overruns and excessive spending, the painful postponing of deadlines and (often) worst of all, a situation worse than the original state or at the very best, no better.
Usually, a company has already taken some small steps in its search for DQ. These should be taken as the basis for further development. Processes that give slow but steady improvement should be prioritised ahead of the “destroy everything and build from scratch” approach.