Real-time Analytics
Yesterday's data means yesterday's decisions.
We build real-time processing platforms with Apache Kafka, Flink and streaming analytics. Sub-second latency for use cases where batch processing isn't enough — fraud detection, dynamic pricing, IoT telemetry, live dashboards.
When batch processing isn’t enough¶
Batch processing (daily/hourly ETL) is the right choice for most analytical use cases. But there are scenarios where delays cost real money:
Fraud Detection¶
Fraudulent transactions must be detected in seconds, not hours. Real-time scoring: transaction → enrichment (customer history, geolocation, device fingerprint) → ML model → approve/decline. Latency >5s = approved fraud.
Dynamic Pricing¶
E-commerce, ride-sharing, hospitality — prices change based on demand, competition, inventory. Batch pricing with hourly updates means an hour of suboptimal prices. Real-time pricing reacts to events immediately.
IoT & Telemetry¶
Thousands of sensors generate millions of events per minute. Anomaly detection on streaming data — if machine temperature exceeds threshold, alert arrives in seconds, not hours. Predictive maintenance requires real-time feature engineering.
Operational Dashboards¶
Live overview of orders, shipments, SLA, throughput. Supply chain visibility — where every shipment is now, not where it was yesterday. Operators need current state, not historical snapshot.
Real-time platform architecture¶
Apache Kafka — event streaming backbone¶
Kafka isn’t just a message broker. It’s a distributed commit log, event streaming platform and integration backbone in one:
- Guaranteed delivery: At-least-once (default) or exactly-once (transactional)
- Ordering: Per-partition ordering guarantees sequential processing
- Replay: Consumer can read from any offset — reprocessing, debugging, new consumer
- Retention: Configurable (hours to unlimited) — Kafka as source of truth
- Schema Registry: Schema evolution (Avro, Protobuf) — producer and consumer agree on format
Stream Processing¶
Apache Flink — our primary stream processor: - Stateful processing with exactly-once semantics - Event time processing — correct results even with out-of-order events - Windowing: tumbling, sliding, session windows - Complex Event Processing (CEP) — pattern matching over event streams - Savepoints and checkpoints for fault tolerance
Kafka Streams — for simpler transformations: - Library, not cluster — runs as part of your application - Filtering, mapping, aggregation, joins - State stores for local state - Ideal for microservice-based architectures
ksqlDB — SQL over streaming data: - SELECT, WHERE, GROUP BY, JOIN — like SQL, but over infinite streams - Materialized views updated in real-time - Ideal for prototyping and simple use cases
Change Data Capture (CDC)¶
Debezium captures database changes and sends them to Kafka in real-time:
- PostgreSQL, MySQL, SQL Server, MongoDB — supported sources
- Log-based CDC — reads from WAL/binlog, no impact on source database
- Schema propagation — schema changes propagate automatically
- Initial snapshot — first load of entire table, then incremental changes
Real-time sinking¶
Data from Kafka to target systems: - Elasticsearch — full-text search, real-time indexing - ClickHouse — OLAP queries over streaming data - Redis — cache for real-time feature store - Snowflake/Databricks — streaming ingestion to warehouse/lakehouse - S3/ADLS — archival of raw events
Monitoring and operations¶
Kafka monitoring¶
- Consumer lag — how far consumer is behind producer. Growing lag = processing bottleneck
- Throughput — messages/s per topic, partition balance
- Broker health — ISR (in-sync replicas), under-replicated partitions
- Storage — disk usage, retention vs. capacity
Stream processing monitoring¶
- Backpressure — processing is slower than input rate
- Checkpoint duration — how long checkpoint takes (Flink)
- Watermark lag — delay between event time vs. processing time
- State size — growing state = potential memory issue
Alerting¶
- Consumer lag > threshold → scale up consumers
- Broker offline → immediate alert + automatic rebalance
- Processing latency > SLA → investigate bottleneck
- Error rate spike → circuit breaker + dead letter queue
Implementation approach¶
- Use case assessment (1 week): Identify use cases where real-time brings measurable value. Not everything needs to be real-time.
- Kafka cluster setup (1-2 weeks): Provisioning (Confluent Cloud or self-managed), topic design, security (mTLS, SASL), Schema Registry.
- MVP streaming pipeline (2-4 weeks): CDC from primary source → Kafka → stream processing → target system. End-to-end monitoring.
- Scaling and optimization (ongoing): Additional sources and consumers, performance tuning, partitioning strategies, cost optimization.
Časté otázky
When delays cost money or safety: fraud detection (seconds = approved fraudulent transaction), dynamic pricing (minutes = lost margin), IoT alerting (minutes = damaged machine), inventory management (hours = out-of-stock items). For reporting and analytics, batch with hourly refresh is usually sufficient.
Managed Kafka (Confluent Cloud, AWS MSK): from $500/month for smaller workloads, $5-20K/month for enterprise. Self-managed: lower license costs but higher operations overhead. We recommend Confluent Cloud for most projects — ROI is clear.
Kafka for event streaming, high throughput, replay capability, event sourcing. RabbitMQ for classic message queuing, request-reply pattern, lower per-message latency. For data platforms, Kafka is almost always the better choice.
Kafka Transactions + idempotent producers for Kafka-to-Kafka. For Kafka-to-external (DB, API) we use idempotent consumers with deduplication on the target side. Flink provides exactly-once semantics with checkpointing.