Data Vault Book Recommendations

Data Vault is getting more and more popular for modeling Data Warehouses. Some of my colleagues asked me for book recommendations about this modeling method. Here a short review (from my personal point of view) of two Data Vault standard books.

Modeling the Agile Data Warehouse with Data Vault

Modeling the Agile Data Warehouse With Data Vault

This book of Hans Hultgren helped me to understand the concepts of Data Vault modeling and how to design a data model with Hubs, Links and Satellites. It gives a good overview of the core constructs of a Data Vault model and some specialities to be considered during the design of a Data Warehouse with Data Vault.

After an introduction of Ensemble modeling methods in general, the book explains the core constructs of Data Vault modeling in detail and how Hubs, Links and Satellites have to be designed. The modeling concepts are described with simple, but practical examples. Additional design topics about each type of Data Vault table are explained in further chapters, including design recommendations and advanced concepts.

The main focus of the book is data modeling and Data Warehouse architecture. Other topics such as data integration, loading data into and from Data Vault tables are mentioned, but the explanations of these subjects are not very detailed.

From my point of view, the book of Hans Hultgren is a very good introduction into the modeling concepts of Data Vault, very informative and easy to read. The concepts are clearly explained, and the examples are descriptive. The only lack is the missing index which makes it harder to find specific topics in the (printed) book. Of course, this is only an issue for the printed version, not for the eBook edition.

Hans Hultgren: Modeling the Agile Data Warehouse with Data Vault, November 2012, ISBN 978-0615723082 (Printed Edition / Kindle Edition)

Building a Scalable Data Warehouse with Data Vault 2.0

Building a Scalable Data Warehouse With Data Vault 2.0

To be honest, I was not very excited about the previous books of Dan Linstedt. They were quite hard for me to read. But his newest book that he wrote together with Michael Olschimke is very practical and contains a lot of useful implementation details.

The book starts with a good introduction to Data Warehousing and DWH architecture. Then it explains the methodology and modeling rules of Data Vault 2.0. Very useful are the additional chapters about intermediate and advanced Data Vault modeling topics.

The most interesting topics for me in this book are the chapters about loading Data Vault, implementing data quality and loading dimensional Information Marts. These chapters contain many practical implementation details and SQL examples that show how a Data Vault model can be loaded and used as a source for dimensional data models. The examples are implemented for SQL Server, but can easily be adapted to other database technologies (in my case, to Oracle).

Some of the chapters and examples with many screenshots I skipped, because they are very specific related to Microsoft tools that I don’t use. But nevertheless I recommend the book to everybody interested in how to implement Data Vault models and the corresponding load processes based on practical examples.

Dan Linstedt & Michael Olschimke: Building a Scalable Data Warehouse with Data Vault 2.0, October 2015, ISBN 978-0128025109 (Printed Edition / Kindle Edition)

Which Book Should You Buy?

So, which of the books is better? As a typical consultant, my answer is of course: “It depends”. If you are mainly interested in design topics of a Data Vault model, the book of Hans Hultgren is definitely a good choice to learn and understand the concepts of Data Vault. If you want to know more details about the implementation of a Data Vault model and the corresponding ETL processes, I recommend the book of Dan Linstedt and Michael Olschimke because of the useful implementation examples.

If you are interested in all topics about Data Vault, do it like me and buy both books. They are a very good combination.

4 thoughts on “Data Vault Book Recommendations

  1. Unfortunately, the majority of information about Data Vault has more theoretical nature and it’s quite challenging to find real practical business case when this approach really gives a measurable advantage.
    In particular, Data Vault is presented as an approach which gives more agility, flexibility; which makes it easier to combine data from multiple sources and add extensions to model, etc.
    On the other hand, it’s clear that this model is not suitable for reporting (so it’s definitely one more layer or intermediate step) and also it does not provide any significant improvements in loading data in compare to traditional star schema.
    Moreover, after detailed analysis, all “advantages” turn to be a strategy to delay proper design at later stages and start developing warehouse in “agile way”.
    Some people say that Data Vault, in particular, makes it very easy to add historical tracking. Well, in the most of cases, [slowly changing] dimensions in star schema also have all versions of attributes which need to be tracked. In the worst case scenario additional historical extension can be added (making star model more like snowflake model) to track those changes if this was not implemented properly in the past and has to be added right here, right now.
    With above being said, Data Vault appears to be more like additional layer which make it possible to delay proper star/snowflake to later stage with no particular benefits of loading data to warehouse or extracting data from it.
    Any specific examples when it’s really useful would be highly appreciated.

    Thanks

    Liked by 1 person

    • Hi Alex

      I agree with you that Data Vault is not suitable for reporting. It is used as a possible modeling method for a Core data warehouse, an Enterprise Data Warehouse or any kind of integration system for multiple source systems. For a simple data warehouse with a few stable sources and a handful of dimensional data marts, it usually makes no sense to split all the data into Hubs, Links and Satellites to rearrange them again. But for complex systems, Data Vault is a good alternative to a Core with a 3NF data model. From my point of view, it is easier to extend, and less migration effort is needed in case of model changes. Historical tracking is also a reason, but this can be implemented with other methods as well (e.g. “head-version modeling”, that I used often before I knew Data Vault).

      But to have a benefit of Data Vault, it is important to understand the data when creating the model and integrate the data from different sources. When Data Vault is only used to “copy” the source data into an additional Data Vault layer, as I have seen in some project reviews, it is a waste of effort. But if Hubs are created based on business entities (instead of source tables), the effort for data integration in further steps in much easier.

      Regards, Dani.

      Like

      • Dani,

        Right, it sounds like Data Vault can be considered as an alternative to 3NF. But frankly speaking I do not see fundamental differences DV vs 3NF. Well, clearly terminology is a bit different – «Hubs, Satellites, Links” instead of “Master, Detail, Link” but still data seems to be in 3NF.

        You have a good introduction video – https://www.youtube.com/watch?v=Q1qj_LjEawc. Maybe you can point to a couple of crucial differences in compare to 3NF in that specific example?

        Finally, even if DV is an alternative to 3NF (which is usually used in operational DB in oppose to DWH) it’s not clear why it’s claimed as data warehouse method.

        Please do not get me wrong, I’m not criticizing DV, I’m just trying to understand its strengths. Maybe I’m missing something and it’s very good for combining data from multiple sources, for lineage or something else. If so then it would be great to know which key difference (in compare to traditional approaches) helps to achieve that.

        Thanks

        Alex

        Like

    • Yes.. Vault does not solve a problem which DW can not. But the key differentiator for data vault is if you want to audit from multiple sources… Data as such .. No transformations.. Then Data Vault will make sense.. And if you have that specific need then DW can come vault and vault is an exceptionally good model for Auditing.. But for sure it needs user friendly layer …In that case as well!!!!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s