The best documentation is the code itself, right?

Databases and reporting platforms have their structure and logic defined for them and embedded inside themselves. A computer needs to know how and what to show to a user. However, business intelligence deployments systematically suffer from a lack of documentation concerning their content (the meanings of terms, the origins and reliability of data and reports). Can or can't BI content documentation be generated from a reporting platform and underlying databases?

Detailed and easy-to-understand reporting documentation is the Holy Grail of any Business Intelligence owner but hard to achieve. Why should this be? Let's take a look at the stakeholders in the documentation process and their priorities:

Business users:

  • are fairly interested in creating documentation when defining their business requirements but not very detailed ones.

  • are very interested in consuming documentation, especially concerning regulatory reporting or when a number in a report turns red.

Technical users:

  • should be interested in creating documentation for life-cycle support, impact analyses and change management in general, however it's much faster to write a table and "create script" manually than model it properly. Documentation tasks are always the last in a project stream and time has usually run out when their turn comes  :(

  • are very interested in consuming documentation when supporting business users e.g. data origins, report justifications ...

As we can see, the real-life result is that some technical documentation is available, but it's not maintained, is out-of-date and there are virtually no business definitions whatsoever. "We need to do business and have no time to write definitions." Sounds familiar? ;). Where does that lead us in our search for the Holy Grail? Not very far.


The magic of reverse engineering

A lack of documentation leads sooner or later to problems:

  • a loss of trust in the system

  • the prohibitively high cost of making changes

  • the system becoming unsupportable

At this moment the search for a magic cure starts. The company starts to search for tools & services that can:

  • automatically load all data entities and reports into one place

  • automatically create mappings between sources and target tables/reports

  • manually document the meaning of entities within a fixed budget and time limit (consulting services)

And we're back where we started, searching for the Holy Grail. Or are we? Let's examine those points in a detail.


Data entities and reports

A modelling tool (e.g. PowerDesigner, Erwin ...) or a database engine provides a list of tables, columns and other structural information. To make this information useful you have to keep in mind that:

  • you need to publish the information in a searchable and easy to understand format

  • the table and column names might be clear to a database administrator, but will not be to a business analyst who sees them for the first time

  • there are no business definitions if no one wrote and entered them when the database was created - and you can bet that no one did!

Extracting tables and a simple report list can kick-start your documentation campaign and bring the first tangible results, literally within days. However, you need to add at least two more attributes manually:

  • the owner of the model/report

  • a brief business description of each table/report

In our experience you can create an in-depth inventory of an existing datawarehouse or datamart and up-to 200 reports in 2-3 weeks. So, BI content documentation can be generated from a reporting platform and underlying databases then? Well, the devil, as they say, is in the details.


Source to target mappings

Data models tell us what is stored in a system. Data mappings or "ETL's"* tells us where the data comes from and where it's heading to. In theory ETL's should be modelled and executed by ETL engines. However, in reality it's different. Even when an ETL engine is used, some jobs are just too complex or too big and a manual script does the loading much faster and more efficiently. It's not easy and sometimes plain impossible to reverse engineer the data flow from the code. For example imagine an analytical CRM data mart where an extract from a core system is done by PL/SQL and then segmentation is done by SAS procedures. Why is the customer in the report flagged as a potential lead for this campaign? IMHO no automaton can answer this question in detail. An automaton can give you a hint that the information comes from this particular system and at best indicate from which tables, but even that information is not always available.

To sum it up: Data mappings are the most complex metadata you can come across, do not expect to have a 100% detailed lineage map of data origins. You need to find a balance between the level of detail needed and the investment required to obtain and maintain it. And to make matters a worse, it gets a little more complex:


Business definitions

Let's say that we have successfully completed the previous tasks and have a perfect model of the data structures, described in technical terms. The developers and administrators are happy, but what about the business people who have paid for the documentation project? The value of the technical metadata to business users is very limited. Technical documentation does not answer the question about the MEANING of the data. Business people want to understand what that number means and what should be done when it is lower than a boss wants it to be. Technical metadata can't answer that.

So does that mean the answer is no, we can't generate BI content documentation from a reporting platform? Actually it doesn't.

An external consultant can help to define the meaning of the data but when s/he leaves and new reports/tables are created or existing ones are changed there'll be problems. In fact, a one-off complete system documentation will be out-of-date the second the consultant leaves your office. However, what happens if you hire a coach who can teach you HOW to define the meaning of data and set up the practice of continually reporting documentation instead of making a one-time run at it. Would that do it? In our experience, yes. It's very similar to feeding a hungry village by a river. You don't give them fish, you teach them how to fish!!.



Getting reports documented takes a while - there is no "magic wand" solution. Nevertheless you can get started and have tangible results in weeks. So in answer to our question, I would argue that yes, BI content documentation can be generated from a reporting platform and its underlying databases. But there's a condition - it requires team work and the involvement of both the end users and the BI team in filling in the business definitions that can't be extracted from the technology.  As with any other undertaking, the hardest thing is getting started. So, after reading this blog I challenge you to take 5 minutes and write down the most commonly used terms in your reports. Tomorrow, take 5 minutes after your lunch-break to review the list and draft one term-definition. The day after take 5 minutes to ...


*extract, transform and load jobs