A good idea when working with Data Vault is to generate the tables and load processes to avoid a lot of manual development. biGENiUS, the Data Warehouse automation tool of Trivadis, is now able to generate Data Vault objects.
A main principle of Data Vault is to split business entities into separate parts: Hubs for the business keys, Links for the relationships between business entities and Satellites for the historized attributes of an entity. A Link may refer two or more Hubs, and multiple Satellites may refer one Hub. Each Hub, each Link and each Satellite is stored in a particular database table. An impact of this modeling approach is that Data Vault models usually contain a high number of tables. For each table, a corresponding ETL process is required. So, a high number of load processes must be implemented, but the number of distinct design pattern is very small. All loaders of one table type have the same structure.
In other words: A Data Vault model is a perfect candidate for a Data Warehouse Automation (DWA) tool. A generator avoids a lot of boring development works, increases the productivity and the quality because errors due to negligence can be prevented. If you use Data Vault modeling in your Data Warehouse, it is strongly recommended to generate as much as possible, either with a specific generator or – even better – with a DWA tool.
The DWA tool biGENiUS was developed by Trivadis in the last few years and is continuously enhanced with new functionality. It generates DDL scripts for tables, views and other database objects in the different Data Warehouse layers as well as the ETL processes to load and transform the data from one layer to another. The generator is driven by metadata that the DWH developers define in the biGENiUS application. The metadata is stored in a repository, called MetaDB of biGENiUS. The MetaDB also contains templates for the scripts to be created. Metadata and templates are the required input for the generator.
Different kind of DWH objects can be defined (source objects, entities, dimensions, facts), containing information about business terms, business keys, relationships, dimensional hierarchies, source term mappings, etc. All this information is declared on a logical level, independent from the implementation. The specific parts for each technology are defined in templates. A template is available for each component and object type to be generated. For each business entity (or source object, dimension or fact) defined in the biGENiUS application, several database objects in Staging Area, Cleansing Area, Core and Data Marts are created. The templates contain the DDL statements for the database objects (e.g. tables, views, indexes, constraints, sequences). For each table, a loader (= ETL process) is generated, either as a PL/SQL package for Oracle or a Transact-SQL procedure for SQL Server. Other databases or tools could be used as well, as long as they have a kind of scripting interface.
The metadata to feed the generator is defined in the biGENiUS application
It would go too far to describe all features of biGENiUS in this blog post. Here, I want to focus on the possibility to generate a Data Vault model for the Core layer of a Data Warehouse.
In biGENiUS, you do not define Hubs, Links and Satellites, but Entities and Terms. An entity is a logical business concept, a term is a business attribute. Technical attributes such as surrogate keys, load dates or source system identifiers are defined globally as Default Columns, not for each entity separately. Connections between different business entities are defined as Relationships.
With this metadata, biGENiUS will generate the required Data Vault objects. For each entity, a Hub is created. It consists of the terms that are marked as business keys (one or more columns) and the default columns that are defined in the Hub template. Links are created from the relationships between different entities. For each term, the user can specify via Custom Properties in which Satellite the term is stored. This allows the flexibility to separate the Satellites per logical attribute sets, per source system or per change frequency.
The generated load processes for the Hubs, Links and Satellites contain INSERT or MERGE statements. They depend on the Data Flow Sets and Source Term Mappings defined in the metadata of the entity. Transformations of source terms are possible if they are required. The loaders for Hubs and Links are straight forward, the Satellite loaders consists of additional statements for delta detection. All loaders are implemented as set-based SQL statements to provide good load performance.
As an interface between the Data Vault tables and the dimensions and fact tables in the Data Mart layer, a set of views is generated to provide the current versions or all intermediate versions of all objects. With these views it is possible to use the same Data Mart loaders for the different Core modeling approaches (Core with Dimensions and Facts, Core with Head & Version tables, Core with Data Vault)
The suggested Data Warehouse architecture implemented in the biGENiUS templates relies on the Trivadis DWH reference architecture, but can be adapted. Because the architecture of a Data Warehouse can be different in each customer project, it is possible (and recommended) to change the project configuration and the biGENiUS templates for the specific needs, or even create new templates. This is useful for changes such as different naming conventions or additional default columns, but also for additional DWH layers, for example for the separation of Raw Data Vault and Business Data Vault.
This blog post gives an overview of the functionality of a Data Warehouse automation tool. More information about biGENiUS can be found on the website bigenius.info. Further aspects about different implementation details I will describe in future blog posts. There are so many interesting subjects in every Data Warehouse project – whether it is manually developed or generated.