Skip to content
_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN DE
Let's talk

Apache Iceberg — Open Table Format for Data Lake

28. 01. 2023 Updated: 27. 03. 2026 1 min read intermediate
This article was published in 2023. Some information may be outdated.

Apache Iceberg is an open table format for massive datasets. Hidden partitioning, schema evolution, and engine-agnostic design.

Iceberg — Table Format

Netflix developed Iceberg for petabyte-scale datasets. Engine-agnostic — Spark, Flink, Trino.

Hidden Partitioning

CREATE TABLE catalog.db.orders (
    order_id BIGINT, customer_id BIGINT,
    order_date TIMESTAMP, total_czk DECIMAL(12,2)
) USING iceberg
PARTITIONED BY (days(order_date), bucket(16, customer_id));

-- No need to know the partitioning!
SELECT * FROM catalog.db.orders
WHERE order_date >= '2026-01-01';

Schema Evolution

ALTER TABLE catalog.db.orders ADD COLUMN discount DECIMAL(12,2);
ALTER TABLE catalog.db.orders RENAME COLUMN status TO order_status;

Comparison

  • Iceberg — multi-engine, open standard
  • Delta Lake — Spark/Databricks integration
  • Hudi — record-level upserts, CDC

Summary

Iceberg is the preferred choice for multi-engine data lakes. Hidden partitioning and vendor neutrality set it apart.

apache icebergtable formatdata lakeopen standard
Share:

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.