Real-time Analytics

Yesterday's data means yesterday's decisions.

We build real-time processing platforms with Apache Kafka, Flink and streaming analytics. Sub-second latency for use cases where batch processing isn't enough — fraud detection, dynamic pricing, IoT telemetry, live dashboards.

I need real-time data Back to Data Platform

<1s

End-to-end latency

1M+ events/min

Throughput

99.99%

Availability

4-8 weeks

MVP implementation

When batch processing isn’t enough¶

Batch processing (daily/hourly ETL) is the right choice for most analytical use cases. But there are scenarios where delays cost real money:

Fraud Detection¶

Fraudulent transactions must be detected in seconds, not hours. Real-time scoring: transaction → enrichment (customer history, geolocation, device fingerprint) → ML model → approve/decline. Latency >5s = approved fraud.

Dynamic Pricing¶

E-commerce, ride-sharing, hospitality — prices change based on demand, competition, inventory. Batch pricing with hourly updates means an hour of suboptimal prices. Real-time pricing reacts to events immediately.

IoT & Telemetry¶

Thousands of sensors generate millions of events per minute. Anomaly detection on streaming data — if machine temperature exceeds threshold, alert arrives in seconds, not hours. Predictive maintenance requires real-time feature engineering.

Operational Dashboards¶

Live overview of orders, shipments, SLA, throughput. Supply chain visibility — where every shipment is now, not where it was yesterday. Operators need current state, not historical snapshot.

Real-time platform architecture¶

Apache Kafka — event streaming backbone¶

Kafka isn’t just a message broker. It’s a distributed commit log, event streaming platform and integration backbone in one:

Guaranteed delivery: At-least-once (default) or exactly-once (transactional)
Ordering: Per-partition ordering guarantees sequential processing
Replay: Consumer can read from any offset — reprocessing, debugging, new consumer
Retention: Configurable (hours to unlimited) — Kafka as source of truth
Schema Registry: Schema evolution (Avro, Protobuf) — producer and consumer agree on format

Stream Processing¶

Apache Flink — our primary stream processor: - Stateful processing with exactly-once semantics - Event time processing — correct results even with out-of-order events - Windowing: tumbling, sliding, session windows - Complex Event Processing (CEP) — pattern matching over event streams - Savepoints and checkpoints for fault tolerance

Kafka Streams — for simpler transformations: - Library, not cluster — runs as part of your application - Filtering, mapping, aggregation, joins - State stores for local state - Ideal for microservice-based architectures

ksqlDB — SQL over streaming data: - SELECT, WHERE, GROUP BY, JOIN — like SQL, but over infinite streams - Materialized views updated in real-time - Ideal for prototyping and simple use cases

Change Data Capture (CDC)¶

Debezium captures database changes and sends them to Kafka in real-time:

PostgreSQL, MySQL, SQL Server, MongoDB — supported sources
Log-based CDC — reads from WAL/binlog, no impact on source database
Schema propagation — schema changes propagate automatically
Initial snapshot — first load of entire table, then incremental changes

Real-time sinking¶

Data from Kafka to target systems: - Elasticsearch — full-text search, real-time indexing - ClickHouse — OLAP queries over streaming data - Redis — cache for real-time feature store - Snowflake/Databricks — streaming ingestion to warehouse/lakehouse - S3/ADLS — archival of raw events

Monitoring and operations¶

Kafka monitoring¶

Consumer lag — how far consumer is behind producer. Growing lag = processing bottleneck
Throughput — messages/s per topic, partition balance
Broker health — ISR (in-sync replicas), under-replicated partitions
Storage — disk usage, retention vs. capacity

Stream processing monitoring¶

Backpressure — processing is slower than input rate
Checkpoint duration — how long checkpoint takes (Flink)
Watermark lag — delay between event time vs. processing time
State size — growing state = potential memory issue

Alerting¶

Consumer lag > threshold → scale up consumers
Broker offline → immediate alert + automatic rebalance
Processing latency > SLA → investigate bottleneck
Error rate spike → circuit breaker + dead letter queue

Implementation approach¶

Use case assessment (1 week): Identify use cases where real-time brings measurable value. Not everything needs to be real-time.
Kafka cluster setup (1-2 weeks): Provisioning (Confluent Cloud or self-managed), topic design, security (mTLS, SASL), Schema Registry.
MVP streaming pipeline (2-4 weeks): CDC from primary source → Kafka → stream processing → target system. End-to-end monitoring.
Scaling and optimization (ongoing): Additional sources and consumers, performance tuning, partitioning strategies, cost optimization.

Časté otázky

When delays cost money or safety: fraud detection (seconds = approved fraudulent transaction), dynamic pricing (minutes = lost margin), IoT alerting (minutes = damaged machine), inventory management (hours = out-of-stock items). For reporting and analytics, batch with hourly refresh is usually sufficient.

Managed Kafka (Confluent Cloud, AWS MSK): from $500/month for smaller workloads, $5-20K/month for enterprise. Self-managed: lower license costs but higher operations overhead. We recommend Confluent Cloud for most projects — ROI is clear.

Kafka for event streaming, high throughput, replay capability, event sourcing. RabbitMQ for classic message queuing, request-reply pattern, lower per-message latency. For data platforms, Kafka is almost always the better choice.

Kafka Transactions + idempotent producers for Kafka-to-Kafka. For Kafka-to-external (DB, API) we use idempotent consumers with deduplication on the target side. Flink provides exactly-once semantics with checkpointing.

Souvisí s

Cloud & Platform Engineering {'cs': 'Kubernetes, IaC, CI/CD a provoz v cloudu.', 'en': 'Kubernetes, IaC, CI/CD and cloud operations.'}

Máte projekt?

Pojďme si o něm promluvit.

Domluvit schůzku