_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Data Quality with Great Expectations — Testing Data as Code

28. 02. 2022 1 min read CORE SYSTEMSdevelopment
Data Quality with Great Expectations — Testing Data as Code

“Why does the report show a negative number of customers?” — a question you don’t want to hear from the CEO. Data quality tests prevent bad data from reaching users.

Great Expectations

Great Expectations is a Python framework for data validation. You define “expectations” (assumptions about data) as code:

  • expect_column_values_to_not_be_null("customer_id")
  • expect_column_values_to_be_between("age", 0, 150)
  • expect_column_values_to_be_unique("email")
  • expect_table_row_count_to_be_between(1000, 1000000)

Pipeline Integration

In the Airflow DAG: after each ETL step, we run validation. If expectations fail, the pipeline stops and notifies the team. Bad data never reaches the analytics layer.

Data Docs

Great Expectations generates an HTML report with validation results — a clear visualization of what passed, what failed, and why. We share it with business stakeholders as proof of data quality.

Test Data, Not Just Code

We test code with unit tests. Data deserves the same approach — automated, versioned, and part of the pipeline.

data qualitygreat expectationstestingdbtdata pipeline
Share:

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us