_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Modern Data Engineering Stack — From ETL to Real-Time Analytics

05. 10. 2021 2 min read CORE SYSTEMSdata
Modern Data Engineering Stack — From ETL to Real-Time Analytics

If I had a crown for every ETL pipeline built on stored procedures in Oracle… The modern data engineering stack has changed dramatically in recent years. dbt, Airflow, Snowflake — tools that are transforming how we work with data. Here’s our stack and why we chose it.

The Old World: ETL in Oracle

Typical Czech enterprise: Oracle DB, stored procedures for transformations, scheduling via Oracle Scheduler or cron. Does it work? Yes. Is it maintainable? Barely. Version control? None. Testing? Non-existent. Documentation? Comments in PL/SQL (maybe).

ELT Instead of ETL

Paradigm shift: instead of transforming before loading (ETL), load raw data and transform in the target system (ELT). Why? Because modern warehouses (Snowflake, BigQuery) have enough compute for transformations. And raw data in a landing zone gives you the ability to reprocess when logic changes.

Our Stack

Ingestion: Airbyte for standard sources (databases, APIs, SaaS), custom Python scripts for specific sources. Data landing zone in Azure Blob Storage.

Warehouse: Snowflake. Yes, it’s expensive. But: separation of storage and compute, auto-scaling, time travel, zero-copy cloning for dev/test environments. For enterprise with many data consumers, it’s a game changer.

Transformation: dbt (data build tool). SQL transformations versioned in Git, data testing (not null, unique, referential integrity), documentation generated from code. For SQL-native analysts, it’s a revolution.

Orchestration: Apache Airflow. DAGs in Python, rich operator ecosystem, UI for monitoring. On Kubernetes (KubernetesExecutor) for scaling.

BI: Metabase for self-service analytics (simple, free tier), Looker for power users. Metabase on top of Snowflake works surprisingly well for 90% of analytical queries.

dbt — Why It’s a Game Changer

dbt turns SQL transformations into a software engineering project: versions in Git, pull requests, code review, CI/CD, tests, documentation. The analyst writes SELECT, dbt handles materialization (view, table, incremental).

-- models/marts/finance/monthly_revenue.sql
SELECT
    date_trunc('month', order_date) AS month,
    SUM(amount) AS revenue,
    COUNT(DISTINCT customer_id) AS customers
FROM {{ ref('stg_orders') }}
WHERE status = 'completed'
GROUP BY 1

Real-Time: Kafka

For batch, Airflow + dbt is enough. For real-time events (transactions, IoT data, clickstream), Apache Kafka. Kafka → Snowflake (Snowpipe) for near-real-time analytics. Kafka → custom consumer for real-time alerting.

Data Governance

GDPR in the Czech Republic is not optional. Data catalog (DataHub), column-level masking in Snowflake, audit logging of who queried what and when. For regulated industries (banking), it’s a prerequisite for operations.

Data as a Product

The modern data stack isn’t just about tools — it’s about the approach. Data as a product, version-controlled transformations, automated testing. If you’re still writing stored procedures without version control and tests, it’s time for an upgrade.

data engineeringetlairflowdbt
Share:

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us