Data Observability — Monitoring Data Pipeline Health

Data observability is monitoring of data pipelines focused on five pillars: freshness, volume, schema, distribution and lineage. Detect problems before business sees them.

Five Pillars of Data Observability¶

Freshness — are the data current?
Volume — did the expected number of records arrive?
Schema — did the schema change?
Distribution — are values in normal ranges?
Lineage — what did the upstream outage affect?

Elementary — observability for dbt¶

# Data Observability — Monitoring Data Pipeline Health
packages:
  - package: elementary-data/elementary
    version: 0.13.0

# models/schema.yml
models:
  - name: fct_orders
    tests:
      - elementary.volume_anomalies:
          timestamp_column: order_date
      - elementary.freshness_anomalies:
          timestamp_column: order_date
      - elementary.column_anomalies:
          column_name: total_czk

Tools¶

Monte Carlo — SaaS, ML-based anomaly detection
Elementary — open-source, dbt-native
Great Expectations + alerting — custom solution

Summary¶

Data observability detects problems earlier than business. Five pillars cover freshness, volume, schema, distribution and lineage.

data observabilitymonitoringfreshnessdata quality

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

All articles

Data Observability — Monitoring Data Pipeline Health

Five Pillars of Data Observability¶

Elementary — observability for dbt¶

Tools¶

Summary¶

CORE SYSTEMS team

More know-how

The Complete Guide to Monitoring

Monitoring Java applications in Nagios

Prometheus: Monitoring for the Cloud-Native World

Prometheus — metrics and monitoring for microservices