_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Real-time Analytics

Yesterday's data means yesterday's decisions.

We build real-time processing platforms with Apache Kafka, Flink and streaming analytics. Sub-second latency for use cases where batch processing isn't enough — fraud detection, dynamic pricing, IoT telemetry, live dashboards.

<1s
End-to-end latency
1M+ events/min
Throughput
99.99%
Availability
4-8 weeks
MVP implementation

When batch processing isn’t enough

Batch processing (daily/hourly ETL) is the right choice for most analytical use cases. But there are scenarios where delays cost real money:

Fraud Detection

Fraudulent transactions must be detected in seconds, not hours. Real-time scoring: transaction → enrichment (customer history, geolocation, device fingerprint) → ML model → approve/decline. Latency >5s = approved fraud.

Dynamic Pricing

E-commerce, ride-sharing, hospitality — prices change based on demand, competition, inventory. Batch pricing with hourly updates means an hour of suboptimal prices. Real-time pricing reacts to events immediately.

IoT & Telemetry

Thousands of sensors generate millions of events per minute. Anomaly detection on streaming data — if machine temperature exceeds threshold, alert arrives in seconds, not hours. Predictive maintenance requires real-time feature engineering.

Operational Dashboards

Live overview of orders, shipments, SLA, throughput. Supply chain visibility — where every shipment is now, not where it was yesterday. Operators need current state, not historical snapshot.

Real-time platform architecture

Apache Kafka — event streaming backbone

Kafka isn’t just a message broker. It’s a distributed commit log, event streaming platform and integration backbone in one:

  • Guaranteed delivery: At-least-once (default) or exactly-once (transactional)
  • Ordering: Per-partition ordering guarantees sequential processing
  • Replay: Consumer can read from any offset — reprocessing, debugging, new consumer
  • Retention: Configurable (hours to unlimited) — Kafka as source of truth
  • Schema Registry: Schema evolution (Avro, Protobuf) — producer and consumer agree on format

Stream Processing

Apache Flink — our primary stream processor: - Stateful processing with exactly-once semantics - Event time processing — correct results even with out-of-order events - Windowing: tumbling, sliding, session windows - Complex Event Processing (CEP) — pattern matching over event streams - Savepoints and checkpoints for fault tolerance

Kafka Streams — for simpler transformations: - Library, not cluster — runs as part of your application - Filtering, mapping, aggregation, joins - State stores for local state - Ideal for microservice-based architectures

ksqlDB — SQL over streaming data: - SELECT, WHERE, GROUP BY, JOIN — like SQL, but over infinite streams - Materialized views updated in real-time - Ideal for prototyping and simple use cases

Change Data Capture (CDC)

Debezium captures database changes and sends them to Kafka in real-time:

  • PostgreSQL, MySQL, SQL Server, MongoDB — supported sources
  • Log-based CDC — reads from WAL/binlog, no impact on source database
  • Schema propagation — schema changes propagate automatically
  • Initial snapshot — first load of entire table, then incremental changes

Real-time sinking

Data from Kafka to target systems: - Elasticsearch — full-text search, real-time indexing - ClickHouse — OLAP queries over streaming data - Redis — cache for real-time feature store - Snowflake/Databricks — streaming ingestion to warehouse/lakehouse - S3/ADLS — archival of raw events

Monitoring and operations

Kafka monitoring

  • Consumer lag — how far consumer is behind producer. Growing lag = processing bottleneck
  • Throughput — messages/s per topic, partition balance
  • Broker health — ISR (in-sync replicas), under-replicated partitions
  • Storage — disk usage, retention vs. capacity

Stream processing monitoring

  • Backpressure — processing is slower than input rate
  • Checkpoint duration — how long checkpoint takes (Flink)
  • Watermark lag — delay between event time vs. processing time
  • State size — growing state = potential memory issue

Alerting

  • Consumer lag > threshold → scale up consumers
  • Broker offline → immediate alert + automatic rebalance
  • Processing latency > SLA → investigate bottleneck
  • Error rate spike → circuit breaker + dead letter queue

Implementation approach

  1. Use case assessment (1 week): Identify use cases where real-time brings measurable value. Not everything needs to be real-time.
  2. Kafka cluster setup (1-2 weeks): Provisioning (Confluent Cloud or self-managed), topic design, security (mTLS, SASL), Schema Registry.
  3. MVP streaming pipeline (2-4 weeks): CDC from primary source → Kafka → stream processing → target system. End-to-end monitoring.
  4. Scaling and optimization (ongoing): Additional sources and consumers, performance tuning, partitioning strategies, cost optimization.

Časté otázky

When delays cost money or safety: fraud detection (seconds = approved fraudulent transaction), dynamic pricing (minutes = lost margin), IoT alerting (minutes = damaged machine), inventory management (hours = out-of-stock items). For reporting and analytics, batch with hourly refresh is usually sufficient.

Managed Kafka (Confluent Cloud, AWS MSK): from $500/month for smaller workloads, $5-20K/month for enterprise. Self-managed: lower license costs but higher operations overhead. We recommend Confluent Cloud for most projects — ROI is clear.

Kafka for event streaming, high throughput, replay capability, event sourcing. RabbitMQ for classic message queuing, request-reply pattern, lower per-message latency. For data platforms, Kafka is almost always the better choice.

Kafka Transactions + idempotent producers for Kafka-to-Kafka. For Kafka-to-external (DB, API) we use idempotent consumers with deduplication on the target side. Flink provides exactly-once semantics with checkpointing.

Máte projekt?

Pojďme si o něm promluvit.

Domluvit schůzku