Mapping out the practices used to access reference data in companies isn't difficult. But how is it used and isn't there a way to make it more accessible and useful than it is in the current status quo?
There's no question that reference data is valuable. It makes data across applications consistent and comparable. When a data warehouse or any other data integration project is introduced for the first time and users start to see the inconsistencies in the company's data, then they start to understand why reference data is so important. The first step in the long journey to achieving company-wide data quality is making reference data transparent.
Initially however, the users actually need easy access to it and there are several ways to do this:
In an application
Reference data can usually be viewed in drop-down menus, for instance a list of countries, currencies etc. This is their primary use case - to make sure that data fields are filled-in correctly. This view is not useful for anything other than inputting. You can't export filter rearrange or do anything else with them in the application.
The value reference list for issue types in our internal Jira issue tracker highlighted by a red rectangle.
If a company has a data warehouse, there's a good chance that there'll be reference data present there and a user can query it or create a report from it. In this scenario reference data accessibility is much higher, but a user must know in advance where to look for it as there's no search option.
A wikipedia report - the list of Bank holidays in UK
Reference data management
More data-mature companies deploy a tool for reference-data management - a single point from which one can access, view and edit reference data across systems and applications. These dedicated tools are primarily designed for a data steward to manage and update reference-data values and a limited number of users have access. They're not very useful for viewing and understanding values per se. That role is taken by business dictionaries and unfortunately the two systems aren't integrated.
Metadata repositories typically include the data models of databases and the data flows between them i.e. descriptions of data structures, but NO DATA. The reason for this is fairly simple - security. It's desirable to publish a list of all the available data sources in one place, but definitely not desirable to publish their content too. The complexity of the permission management would be overwhelming.
However, when you document a product dimension say, or the account dimension of a general ledger and the documentation has a table with a description of all products, with a structure something like this:
valid from, valid to …
the information is hmm, at best useless to any analyst or product manager. What users actually want to see is a list of the products a company actually has. But this requirement is in direct conflict with the above-mentioned security policies. Metadata repositories - just metadata, NO DATA. Sounds like catch 22, doesn’t it? :)
Data model - what room types are there? what types of charges?
Reference data as metadata
Is there an alternative route? My suggestion is to classify reference data as metadata and include it in the metadata repository. It breaks the existing paradigm of the strict separation of data and metadata, but makes a lot of sense.
Reference-data values IMHO belong in a business dictionary.
Let's look at an example: a mobile operator sells its services through three channels (shops, online e-shops, partners). Each channel has its own application. From a data-governance perspective, three tools might be relevant:
A reference-data management tool that will make sure that the list of tariff names and codes are identical in all applications.
A data dictionary which documents the data models and allows analysts to access the data and report it.
A business dictionary which defines different types of tariffs in a detailed business description like in a wikipedia.
If these data-governance tools are not integrated, there's no single place where it's possible to get a complete set of information about a specific tariff (what it is, where it's stored, who owns it).
Having said that, how would it work?
- Reference-data tables would be included in the data dictionary just like other tables.
All the content of the reference-data tables would be accessible in read-only mode directly in the data dictionary.
Reference-data tables would be linked to any reports which are based on their values.
Reference-data tables would be linked to an RDM tool which would manage their content.
The reference-data tables most important values would be in the business dictionary as terms.
This approach allows the creation of a single Information Touchpoint that makes it possible:
to see, not only the structure of the reference data, but all the content of the reference-data tables as well
for anyone to search and access reference-data values
to discover inconsistencies in reference data between systems
to do all of the above without a large integration effort (the reference data stays where it is, and it's only visible from one place)
We use data and metadata everyday in our work places. But they're always separated, removed from each other and we never make use of them together. I personally feel that that's a huge opportunity wasted. With a small investment and the thoughtful software I've sketched out here, it's possible to leverage so much more potential from their synergy. If you want to know more about the Semanta product Encyclopaedia which does just that, then don't hesitate to contact us me. I'd be more than happy to explain the details.
Preview reference data values directly in Encyclopaedia Data Dictionary
Ataccama RDM - reference data enriched with definitions loaded directly from Encyclopaedia Business Dictionary