_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Apache Airflow — Data Pipeline Orchestration in Practice

10. 07. 2025 1 min read intermediate

Apache Airflow is the most widespread data pipeline orchestrator. Define workflows as Python code, schedule execution and monitor progress.

What is Apache Airflow

Airflow defines workflows as DAG (Directed Acyclic Graph) — a graph of tasks with operators and dependencies.

Core Concepts

  • DAG — workflow as Python code
  • Operator — individual task (Bash, Python, SQL)
  • Scheduler — scheduler based on cron expressions
  • Executor — Local, Celery, Kubernetes
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

with DAG(
    dag_id='daily_sales',
    schedule_interval='0 6 * * *',
    start_date=datetime(2026, 1, 1),
    catchup=False,
) as dag:
    extract = PythonOperator(task_id='extract', python_callable=extract_fn)
    transform = PythonOperator(task_id='transform', python_callable=transform_fn)
    load = PythonOperator(task_id='load', python_callable=load_fn)
    extract >> transform >> load

TaskFlow API (Airflow 2.x)

from airflow.decorators import dag, task

@dag(schedule_interval='@daily', start_date=datetime(2026, 1, 1))
def sales_pipeline():
    @task()
    def extract(): return fetch_data()
    @task()
    def transform(data): return clean(data)
    @task()
    def load(data): save(data)
    load(transform(extract()))

sales_pipeline()

Best Practices

  • Idempotence — safe repeated execution
  • Atomicity — task succeeds or fails completely
  • XCom only for metadata — not for large datasets

Summary

Airflow is the standard for orchestration. TaskFlow API simplifies code, the key is idempotence and proper credential management.

apache airfloworchestrationdagpipeline
Share:

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.