Soda offers a YAML-based approach to data quality monitoring. Soda Checks Language lets you define checks without writing code.
Soda Checks Language¶
# checks/orders.yml
checks for fact_orders:
- row_count > 0
- missing_count(order_id) = 0
- duplicate_count(order_id) = 0
- min(total_czk) >= 0
- freshness(order_date) < 1d
- values in (customer_id) must exist in dim_customers (customer_id)
Running Checks¶
from soda.scan import Scan
scan = Scan()
scan.set_data_source_name("warehouse")
scan.add_sodacl_yaml_file("checks/orders.yml")
scan.execute()
if scan.has_check_fails():
for c in scan.get_checks_fail():
print(f"✗ {c.name}: {c.outcome}")
Soda vs Great Expectations¶
- Soda — YAML, quick start, SQL-native
- GX — Python API, complex validations
Summary¶
Soda is ideal for a declarative YAML approach to data quality. Quick to start and easy to integrate.
sodadata qualitymonitoringyaml