Soda nabízí YAML-based přístup k monitoringu kvality dat. Soda Checks Language umožňuje definovat kontroly bez kódu.
Soda Checks Language¶
# checks/orders.yml
checks for fact_orders:
- row_count > 0
- missing_count(order_id) = 0
- duplicate_count(order_id) = 0
- min(total_czk) >= 0
- freshness(order_date) < 1d
- values in (customer_id) must exist in dim_customers (customer_id)
Spuštění¶
from soda.scan import Scan
scan = Scan()
scan.set_data_source_name("warehouse")
scan.add_sodacl_yaml_file("checks/orders.yml")
scan.execute()
if scan.has_check_fails():
for c in scan.get_checks_fail():
print(f"✗ {c.name}: {c.outcome}")
Soda vs Great Expectations¶
- Soda — YAML, rychlý start, SQL-native
- GX — Python API, komplexní validace
Shrnutí¶
Soda je ideální pro deklarativní YAML přístup ke kvalitě dat. Rychlý start a snadná integrace.
sodadata qualitymonitoringyaml