Data Lakes
Apache Iceberg — otevřený table format pro data lake
Apache Iceberg je otevřený table format pro obrovské datasety. Hidden partitioning, schema evolution a engine-agnostický design.
Iceberg — table format
Netflix vyvinul Iceberg pro petabytové datasety. Engine-agnostický — Spark, Flink, Trino.
Hidden partitioning
CREATE TABLE catalog.db.orders (
order_id BIGINT, customer_id BIGINT,
order_date TIMESTAMP, total_czk DECIMAL(12,2)
) USING iceberg
PARTITIONED BY (days(order_date), bucket(16, customer_id));
-- Nemusíte znát partitioning!
SELECT * FROM catalog.db.orders
WHERE order_date >= '2026-01-01';
Schema evolution
ALTER TABLE catalog.db.orders ADD COLUMN discount DECIMAL(12,2); ALTER TABLE catalog.db.orders RENAME COLUMN status TO order_status;
Porovnání
- Iceberg — multi-engine, open standard
- Delta Lake — Spark/Databricks integrace
- Hudi — record-level upserty, CDC
Shrnutí
Iceberg je preferovaná volba pro multi-engine data lake. Hidden partitioning a vendor neutralita.