How do data warehouses die?

From time to time I am invited to give a lecture on Business Intelligence to university students. When I was a student all the topics discussed in technology courses were about developing and delivering new systems. No one ever spoke about phasing out an existing one.

One smart guy once asked me: "When is a system implementation successful? - When it replaces at least one existing one." From this perspective I have not experienced a lot of "successful" BI/DWH implementations but I have met a lot of BI/DWH zombies. This idea led me to the question - "How does a data warehouse die?"


The alphabet soup of system acronyms

Imagine the information galaxy of a typical corporation with tens or hundreds of ERP, CRM, SCM, HRM, etc systems. An army of developers has connected the majority of them with data pumps and concentrated their content in one place - a data warehouse. More new systems are connected over time, very few old ones are decommissioned and disconnected.

What would happen if there was actual data printed on paper flowing through the pumps? There would be an overflowing pile of the stuff in the middle of the building. Finding any relevant piece of information in that heap would become more and more difficult and identification of a source nearly impossible. ENTROPY (a measure of disorder) creeps in.

enterprise information galaxy compressed into one point - data warehouse


Don't:   Replace an existing technology or build a newer, bigger warehouse. It usually does not solve a problem with disorder or data quality.

Do:   Build a customer-oriented information service to help people find answers and create trust in an existing warehouse.


Dead star

If you remember basic physics from high school, it is the nature of all closed systems to become more and more disorganized over time. This is true many times over for a data warehouse where changes from many source systems combine and contribute to its ever changing structure. There is, in fact, a threshold where chaos overtakes even the efforts of the hardiest data warehouse team and that's the point when the data warehouse starts to die. The first symptom is a loss of trust by the end users - they cease to believe in the reports coming from it.

Let's use a star classification system as an analogy to describe the possible end states of data warehouses.


Red dwarf

small, cold, hardly visible                    >       departmental data mart


Blue giant

large, very visible, but burned out       >       enterprise warehouse


Black hole       

sucks everything in, nothing gets out    >      warehouse(s) gone wild


You would be surprised how many of these "dead stars" are vegetating in enterprises in your hometown. Sometimes there are even several of them in one institution...


How to fight chaos

First of all, we need to admit that:

  • more data leads to more disorder

  • every warehouse is born and dies

There is only one remedy for disorder in data - proper documentation. Let me be crystal clear by defining documentation - it is a description of a data structure that is UNDERSTANDABLE by a third person without any previous knowledge of the system i.e. how a piece of data is technically calculated and what is its business meaning. When you miss either, technical or business definition, you do NOT have adequate documentation and ENTROPY creeps in.

Don't:   Create a world that does not exist - where there is only "the single version of truth" or pretend that a spreadsheet reporting world outside the DWH does not exist.

Do:   Describe the world as is - provide answers - the best you have.

No one would argue that there is a need for a budget, a plan, and documentation etc. when implementing a new data warehouse. The very same thing is needed for the decommissioning of an old warehouse. This is the moment when you either capitalize on the existing data models and report documentation or pay a high price with huge interest for all the documentation you've skipped or haven't updated over the years, because you have no clue what's inside, or who is using it and how to re-implement it in a new data warehouse.

I do not want too sound gloomy, but remember every time you close one eye and skip a documentation task, you invest into the grizzlier and earlier decommissioning of your data warehouse.