Apache Iceberg is an open table format for massive datasets. Hidden partitioning, schema evolution, and engine-agnostic design.
Iceberg — Table Format¶
Netflix developed Iceberg for petabyte-scale datasets. Engine-agnostic — Spark, Flink, Trino.
Hidden Partitioning¶
CREATE TABLE catalog.db.orders (
order_id BIGINT, customer_id BIGINT,
order_date TIMESTAMP, total_czk DECIMAL(12,2)
) USING iceberg
PARTITIONED BY (days(order_date), bucket(16, customer_id));
-- No need to know the partitioning!
SELECT * FROM catalog.db.orders
WHERE order_date >= '2026-01-01';
Schema Evolution¶
ALTER TABLE catalog.db.orders ADD COLUMN discount DECIMAL(12,2);
ALTER TABLE catalog.db.orders RENAME COLUMN status TO order_status;
Comparison¶
- Iceberg — multi-engine, open standard
- Delta Lake — Spark/Databricks integration
- Hudi — record-level upserts, CDC
Summary¶
Iceberg is the preferred choice for multi-engine data lakes. Hidden partitioning and vendor neutrality set it apart.
apache icebergtable formatdata lakeopen standard