Data observability is monitoring of data pipelines focused on five pillars: freshness, volume, schema, distribution and lineage. Detect problems before business sees them.
Five Pillars of Data Observability¶
- Freshness — are the data current?
- Volume — did the expected number of records arrive?
- Schema — did the schema change?
- Distribution — are values in normal ranges?
- Lineage — what did the upstream outage affect?
Elementary — observability for dbt¶
# packages.yml
packages:
- package: elementary-data/elementary
version: 0.13.0
# models/schema.yml
models:
- name: fct_orders
tests:
- elementary.volume_anomalies:
timestamp_column: order_date
- elementary.freshness_anomalies:
timestamp_column: order_date
- elementary.column_anomalies:
column_name: total_czk
Tools¶
- Monte Carlo — SaaS, ML-based anomaly detection
- Elementary — open-source, dbt-native
- Great Expectations + alerting — custom solution
Summary¶
Data observability detects problems earlier than business. Five pillars cover freshness, volume, schema, distribution and lineage.
data observabilitymonitoringfreshnessdata quality