Flink is the most advanced engine for stream processing. Exactly-once semantics, event time processing, and state management.
Why Flink¶
Stream-first approach — batch is a special case of streaming.
Flink SQL¶
CREATE TABLE orders (
order_id STRING,
amount DECIMAL(10,2),
order_time TIMESTAMP(3),
WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
) WITH ('connector' = 'kafka', 'topic' = 'orders', 'format' = 'json');
SELECT
TUMBLE_START(order_time, INTERVAL '5' MINUTE) AS window_start,
COUNT(*) AS order_count,
SUM(amount) AS revenue
FROM orders
GROUP BY TUMBLE(order_time, INTERVAL '5' MINUTE);
Comparison¶
- Flink — true streaming, lowest latency
- Spark Streaming — micro-batch, batch+stream hybrid
- Kafka Streams — library, simple transformations
Summary¶
Flink is the top choice for low latency and exactly-once processing. Flink SQL makes streaming accessible to analysts.
apache flinkstream processingreal-timestateful