_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Great Expectations — automatizovaná validace kvality dat

08. 08. 2025 1 min read intermediate

Great Expectations umožňuje definovat, testovat a dokumentovat očekávání na vaše data. Automaticky generuje dokumentaci a integruje se s Airflow, Spark i pandas.

Proč validovat kvalitu dat

Great Expectations definuje pravidla a automaticky je kontroluje v každém běhu pipeline.

import great_expectations as gx

context = gx.get_context()
validator = context.get_validator(batch_request=batch_request)
validator.expect_column_values_to_be_unique("order_id")
validator.expect_column_values_to_not_be_null("customer_id")
validator.expect_column_values_to_be_between(
    "total_czk", min_value=0, max_value=10_000_000
)
validator.save_expectation_suite()

Integrace s Airflow

def validate_data():
    context = gx.get_context()
    result = context.run_checkpoint("daily_orders")
    if not result.success:
        raise ValueError("Data quality check failed!")

extract >> validate_task >> transform

Summary

Great Expectations je standard pro automatizovanou validaci dat v Python pipeline.

great expectationsdata qualityvalidacetesting
Share:

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.