Data Vault 2.0 is modeling approach for agile, scalable and auditable enterprise warehouse. Hubs, links and satellites enable parallel development.
Three Basic Entities¶
- Hub — business keys (immutable core)
- Link — relationships between hubs
- Satellite — descriptive attributes with history
CREATE TABLE hub_customer (
hub_customer_hk CHAR(32) PRIMARY KEY,
customer_id VARCHAR(50) NOT NULL,
load_date TIMESTAMP NOT NULL,
record_source VARCHAR(100) NOT NULL
);
CREATE TABLE sat_customer_details (
hub_customer_hk CHAR(32) REFERENCES hub_customer,
load_date TIMESTAMP NOT NULL,
hash_diff CHAR(32) NOT NULL,
customer_name VARCHAR(200),
email VARCHAR(200),
segment VARCHAR(50),
PRIMARY KEY (hub_customer_hk, load_date)
);
CREATE TABLE link_customer_product (
link_hk CHAR(32) PRIMARY KEY,
hub_customer_hk CHAR(32),
hub_product_hk CHAR(32),
load_date TIMESTAMP NOT NULL
);
Advantages¶
- Parallel development — independent teams
- Complete history — every change in satellites
- Auditability — record_source on every record
Summary¶
Data Vault is ideal for enterprise with many sources and auditability requirements.
data vaultdata modelingenterprisewarehouse