Telemetry & Streaming
Data from sensor to backend in under 100ms.
We build telemetry pipelines that deliver data reliably, quickly and in guaranteed order — from a single sensor to thousands of devices.
Telemetry as the Bloodstream of IoT Systems¶
Sensors without a reliable data pipeline are expensive hardware. They measure, but the data never arrives. Or it arrives late. Or it gets lost. Or it arrives out of order and anomaly detection generates false alarms.
We build telemetry chains where every message arrives where it should, when it should, and in the correct order. From sensor through MQTT broker to Kafka, through stream processing to a time-series database and dashboard. Every step monitored, scalable, resilient.
MQTT as Transport¶
Why MQTT¶
MQTT (Message Queuing Telemetry Transport) is the de facto standard for IoT communication. Designed in 1999 for oil pipelines — reliable data transfer over unreliable satellite connections. Today running on billions of devices.
Key properties for IoT:
- Lightweight: 2 byte minimum header. An entire message with a 100B payload has 102B overhead. An HTTP request for the same data: 500B+.
- Pub/sub model: Device publishes to a topic, backend subscribes. Decoupling — the device doesn’t know (and doesn’t need to know) who reads the data.
- Persistent sessions: Broker remembers subscriptions and undelivered messages for offline devices. After reconnecting, it delivers everything that was missed.
- Last Will and Testament: When connecting, a device defines its “last will” — a message the broker sends if the device unexpectedly disappears. Immediate offline device detection.
Quality of Service¶
Three levels of delivery guarantee:
- QoS 0 (At most once): Fire-and-forget. No acknowledgement. For data where occasional loss is acceptable (high-frequency telemetry where a missing sample can be interpolated).
- QoS 1 (At least once): Delivery confirmation. Possible duplicates. For most telemetry — the backend must be idempotent.
- QoS 2 (Exactly once): Four-step handshake. No losses, no duplicates. For critical data (transactions, alarms, commands). Higher overhead — used only where necessary.
MQTT 5.0 Features¶
- Shared subscriptions: Load balancing across multiple consumers. Topic
$share/group/sensors/+/temperature— messages distributed round-robin among subscribers. - Topic aliases: Shortening repeating topic names. Reduces bandwidth by 10-30%.
- Flow control: Receiver can say “slow down” — protection against overwhelming a slow consumer.
- Request/response: Native request/response pattern via correlation data and response topic.
MQTT Broker¶
EMQX: Distributed, cluster-capable. Millions of concurrent connections. Rule engine for routing and transformation. Bridge to Kafka, HTTP, databases.
Mosquitto: Lightweight, single-node. Ideal for edge and smaller deployments. C implementation, minimal resources.
Azure IoT Hub / AWS IoT Core: Managed MQTT broker with integrated device management. Vendor lock-in, but zero operations.
Kafka for Stream Processing¶
MQTT delivers data to the broker. Kafka is the central nervous system — event store, stream processor, integration hub.
Why Kafka After MQTT¶
An MQTT broker is transport — it delivers messages to subscribers and (typically) forgets. Kafka is a durable event log — messages stored on disk, retention from days to years. Replay at any time. A new consumer can process historical data from the beginning.
Architecture pattern:
Device → MQTT Broker → Kafka Connect/Bridge → Kafka → Consumers
Stream Processing¶
Kafka Streams or Apache Flink for real-time processing:
- Aggregation: Average temperature over the last 5 minutes, maximum vibration over an hour
- Windowing: Tumbling, hopping, sliding windows. Session windows for activity detection.
- Anomaly detection: Z-score, moving average, isolation forest. Alert when value deviates from baseline.
- Enrichment: Adding context — device metadata, location, customer. Join stream with reference data.
- Filtering: Noise filtering, deduplication, format validation.
Consumer Groups¶
Parallel processing via consumer groups:
- Alerting consumer: Real-time rule evaluation, push notification on threshold breach
- Storage consumer: Writing to time-series DB for historical queries
- Analytics consumer: Aggregation for dashboards and reports
- ML consumer: Feature store for ML models, training data pipeline
Each consumer independently scalable. Adding a new consumer without impact on existing ones.
Data Pipeline Architecture¶
From Sensor to Dashboard¶
Sensor → [Local buffer] → MQTT (QoS 1) → MQTT Broker
→ Kafka Connect → Kafka Topic (raw)
→ Stream Processing (validation, enrichment)
→ Kafka Topic (processed)
→ InfluxDB/TimescaleDB (storage)
→ Grafana (visualization)
→ Alert Engine (rules)
→ ML Pipeline (features)
Every step is independently scalable, monitored and recoverable.
Dead Letter Queue¶
Messages that fail validation (wrong format, unknown device, out of range) are not lost. They go to a dead letter queue for analysis:
- Automatic notification on DLQ growth
- Dashboard with top error reasons
- Manual review and reprocessing
- Root cause analysis — bad firmware? Bug in device code?
Compression and Efficiency¶
- Protocol Buffers: Binary serialization, 3-10× smaller than JSON. Schema evolution with backward/forward compatibility. Ideal for high-throughput telemetry.
- MessagePack: JSON-compatible binary format. Easier adoption than Protobuf, still 30-50% smaller than JSON.
- Device-side batching: Local buffer, sending after N messages or T seconds. Reduces connection overhead.
- Adaptive sampling: Normal operation: 1 sample/min. Anomaly detected: 10 samples/s. Dynamic granularity based on need.
Time-Series Storage¶
InfluxDB¶
Native time-series database. Optimized for write-heavy workloads:
- Ingest: hundreds of thousands of points per second
- Flux query language for transformations and analysis
- Retention policies: automatic expiration of old data
- Continuous queries for pre-aggregation
TimescaleDB¶
PostgreSQL extension for time-series:
- Full SQL — JOINs with relational data (device metadata, customers)
- Hypertables for automatic partitioning
- Compression: 90-95% space savings for older data
- Continuous aggregates for materialized views
Choice¶
InfluxDB for purely telemetric workloads (simpler operations, native time-series UX). TimescaleDB when you need SQL, JOINs, or already have a PostgreSQL stack.
Technology Stack¶
Transport: MQTT 5.0, AMQP, CoAP, HTTP/2.
Brokers: EMQX, Mosquitto, Azure IoT Hub, AWS IoT Core, HiveMQ.
Streaming: Apache Kafka, Kafka Streams, Apache Flink, Redpanda.
Storage: InfluxDB, TimescaleDB, QuestDB, Apache Parquet (archive).
Serialization: Protocol Buffers, MessagePack, Avro, JSON.
Visualization: Grafana, custom dashboards.
Časté otázky
MQTT is designed for unreliable networks and resource-constrained devices. Persistent sessions, QoS guarantees, minimal overhead (2 byte header vs. hundreds of bytes with HTTP). For IoT telemetry, MQTT is the standard.
It depends on the infrastructure. EMQX cluster: millions of concurrent connections, hundreds of thousands of messages per second. Kafka: millions of messages per second per cluster. Horizontally scalable.
MQTT QoS 1/2 guarantees delivery. Devices store data locally during outages (store-and-forward). After connectivity is restored, data is sent in order. Dead letter queue for undeliverable messages.
InfluxDB for smaller deployments (simpler operations). TimescaleDB for larger ones (PostgreSQL compatibility, SQL queries, JOINs with relational data). QuestDB for ultra-low latency ingest.