Apache Iceberg — Open Table Format for Data Lake

Apache Iceberg is an open table format for massive datasets. Hidden partitioning, schema evolution, and engine-agnostic design.

Iceberg — Table Format¶

Netflix developed Iceberg for petabyte-scale datasets. Engine-agnostic — Spark, Flink, Trino.

Hidden Partitioning¶

CREATE TABLE catalog.db.orders (
    order_id BIGINT, customer_id BIGINT,
    order_date TIMESTAMP, total_czk DECIMAL(12,2)
) USING iceberg
PARTITIONED BY (days(order_date), bucket(16, customer_id));

-- No need to know the partitioning!
SELECT * FROM catalog.db.orders
WHERE order_date >= '2026-01-01';

Schema Evolution¶

ALTER TABLE catalog.db.orders ADD COLUMN discount DECIMAL(12,2);
ALTER TABLE catalog.db.orders RENAME COLUMN status TO order_status;

Comparison¶

Iceberg — multi-engine, open standard
Delta Lake — Spark/Databricks integration
Hudi — record-level upserts, CDC

Summary¶

Iceberg is the preferred choice for multi-engine data lakes. Hidden partitioning and vendor neutrality set it apart.

apache icebergtable formatdata lakeopen standard

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

All articles

Apache Iceberg — Open Table Format for Data Lake

Iceberg — Table Format¶

Hidden Partitioning¶

Schema Evolution¶

Comparison¶

Summary¶

CORE SYSTEMS team

More know-how

Lakehouse Architecture — Merging Data Lake and Warehouse

Apache Kafka — Distributed Streaming Platform

Azure Cosmos DB — Global NoSQL

PostgreSQL Partitioning