The Core is an important part of every Data Warehouse because it is used for the integration of data from different source systems. But is a Core really required in every DWH?
As long as we have only one source system and one Data Mart, there is no reason to have an integration layer, and the historical data can be stored in the Data Mart directly. But what happens when there are new requirements for an additional Data Mart? The enhancement of such a Data Warehouse is often a quite complex task, and the effort to implement such a solution is underestimated in many projects. Therefore a Core layer should be designed in every DWH architecture, even if it is not implemented in the first phase of a project.
At the moment I am preparing a review of an existing DWH that has no Core. As I have seen from the existing project documentation so far, several Data Marts were implemented, and each Data Mart is related to one source system. There are only a few common dimensions, and the different Data Marts are more or less independent from each other. Before I will write the finding “Add a Core layer to your Data Warehouse” in the review paper, I will try to find out why no Core was designed in this DWH project. If there is a good reason for that approach, it is always allowed to skip a layer of the reference architecture.
Integration Layer without Integration
On the other side, a “perfect” architecture is not a guarantee for a good Data Warehouse. I remember another Data Warehouse that consists of a Core, but even it was called “Integration Layer” in this project, it was used for almost everything except data integration. Data from multiple source systems were stored in the DWH database, and a lot of effort was spent to save the history of all kind of information. But when we started to implement the ETL jobs to load the first Data Mart we used much more time than expected because the transformation rules were different for each of the source systems. The ETL logic for the Data Mart was very complex at the end, and – even worse – it has to be implemented again for all future Data Marts. A better approach would have been to implement the transformation rules for the integration of the different sources in the ETL jobs to load the Core. Then the creation of additional Data Marts would have been much easier and cheaper to implement.
Blueprints and reference architectures are a good starting point for every DWH project. A good architecture is the base – but not a guarantee – for a successful Data Warehouse. Depending on the particular requirements, there can always be good reasons to use a different or adapted approach. As long as the differences to the reference architecture can be explained, almost everything is allowed. That makes our consulting job interesting.