Event-Driven Architecture: From Monolith to Reactive Systems

Your monolith works. Orders are processed, payments go through, customers receive their emails. But every time you want to add a new feature, it takes months. Every deployment is a nightmare. And when one service goes down, everything goes down. This is the story of how to get out of that — and why event-driven architecture is not just a buzzword but a production-proven pattern.

Why the Synchronous World Is Not Enough¶

Traditional monolithic architecture works on the request-response principle. Service A calls Service B, waits for a response, then calls Service C. Everything is synchronous, tightly coupled and dependent on every component in the chain being available. It works — until the system grows.

Problems appear on three levels. First, temporal coupling: all services must be online simultaneously. Second, scalability: you cannot scale one part of the system independently. And third, evolvability: a change in one service cascades to all that depend on it.

Event-driven architecture (EDA) solves these problems with a fundamentally different approach. Instead of “call and wait,” it says: “something happened — whoever cares can read about it.” A producer emits an event and neither knows nor needs to know who consumes it. Consumers read messages at their own pace. Loosely coupled, asynchronous, resilient.

Apache Kafka as the System Backbone¶

When we talk about event-driven architecture at enterprise scale, we are talking about Apache Kafka. Alternatives exist — RabbitMQ, Apache Pulsar, NATS, AWS EventBridge — but Kafka has become the de facto standard for systems that need to process millions of events per second with delivery guarantees and persistence.

Kafka is not a message queue. It is a distributed commit log — a persistent, ordered, replicable record of everything that has happened in the system. Events are not deleted after reading. They remain in the log according to the configured retention policy (days, weeks, forever). This means a new consumer can start reading from the beginning and “replay” the entire history.

Kafka Producer — Java Example¶

`// Kafka Producer — sending an OrderCreated event

Properties props = new Properties();

props.put(“bootstrap.servers”, “kafka-1:9092,kafka-2:9092”);

props.put(“key.serializer”, StringSerializer.class);

props.put(“value.serializer”, KafkaAvroSerializer.class);

props.put(“schema.registry.url”, “http://schema-registry:8081”);

props.put(“acks”, “all”);

props.put(“enable.idempotence”, “true”);

var producer = new KafkaProducer<>(props);

var event = OrderCreated.newBuilder()

.setOrderId("ORD-2026-00142")

.setCustomerId("CUST-8837")

.setAmount(new BigDecimal("24990.00"))

.setCurrency("CZK")

.setTimestamp(Instant.now())

.build();

producer.send(new ProducerRecord<>(

"orders.created", event.getOrderId(), event

));`

Kafka Consumer — Processing Events¶

`// Kafka Consumer — consumer group for the inventory service

@KafkaListener(

topics = "orders.created",

groupId = "inventory-service",

containerFactory = "kafkaListenerFactory"

)

public void handleOrderCreated(OrderCreated event) {

log.info("Reserving stock for order {}", event.getOrderId());

try {

    inventoryService.reserveStock(event);

    publishEvent("inventory.reserved",

        new StockReserved(event.getOrderId()));

} catch (InsufficientStockException e) {

    publishEvent("inventory.reservation-failed",

        new ReservationFailed(event.getOrderId(), e.getMessage()));

}

Key details: acks=all ensures the event is written to all replicas before acknowledgement. enable.idempotence=true eliminates duplicates during network retries. And consumer groups enable horizontal scaling — adding more instances automatically redistributes partitions.

Event Sourcing and CQRS¶

Event-driven architecture naturally leads to two advanced patterns that change how we think about data.

Event Sourcing — The Truth Lives in Events¶

In the traditional approach, you store the current state: the order has status “paid.” In event sourcing, you store the sequence of events that led to that state: OrderCreated → PaymentReceived → OrderConfirmed. The current state is derived — a computed view over the event history.

The benefits are substantial. You have a complete audit trail — you know not only what happened, but when, in what order and why. You can replay history and reconstruct the system state at any point in time. And you can add new projections over existing data without database migrations.

`// Event schema (Avro) — orders.avsc

{

“type”: “record”,

“name”: “OrderEvent”,

“namespace”: “cz.core.orders.events”,

“fields”: [

{"name": "eventId", "type": "string"},

{"name": "eventType", "type": {

  "type": "enum",

  "name": "OrderEventType",

  "symbols": ["CREATED","PAID","SHIPPED","DELIVERED","CANCELLED"]

}},

{"name": "orderId", "type": "string"},

{"name": "timestamp", "type": {"type": "long", "logicalType": "timestamp-millis"}},

{"name": "payload", "type": "string"},

{"name": "version", "type": "int", "default": 1}

]

CQRS — Separating Reads and Writes¶

Command Query Responsibility Segregation separates the write model (commands that change state) from the read model (queries that read state). Combined with event sourcing, this means commands produce events into Kafka, and read models are materialised views optimised for specific queries.

A practical example: an e-shop writes orders as events. One consumer builds a SQL projection for the customer order detail. Another builds an Elasticsearch index for full-text search. A third computes real-time metrics in ClickHouse. Same data, three optimised views, zero coupling.

Saga Pattern — Distributed Transactions¶

In a monolith, you have database transactions. In a distributed system, you do not. The saga pattern addresses distributed consistency as a sequence of local transactions where each step has a defined compensation action in case of failure.

Example: creating an order involves stock reservation, payment authorisation and sending a confirmation. If the payment fails, the saga automatically triggers compensation — it releases the stock reservation and creates a cancelled event. Two approaches exist:

Choreography: each service reacts to events and publishes its own. Decentralised, simple for small sagas, but hard to trace in complex flows.
Orchestration: a central orchestrator directs the sequence of steps. Clearer, better monitoring, but a single point of coordination.

In practice, for enterprise systems we prefer orchestration. The reason is pragmatic: when a saga fails at step three of five, you want to see clearly where it got stuck and have a single place for retry logic. Choreography works for simple flows (2–3 steps), but for complex business processes it becomes unmaintainable.

Dead Letter Queues — When Events Fail¶

In a distributed system, events will fail. Invalid data, an unavailable service, a bug in consumer logic. The question is not if, but how you respond.

A Dead Letter Queue (DLQ) is a topic where events are moved after a configured number of failed processing attempts. Instead of an infinite retry loop or data loss, the event “parks” in the DLQ, where it awaits manual or automated remediation.

Retry strategy: exponential backoff with jitter — 1s, 2s, 4s, 8s + random offset to prevent a thundering herd effect
DLQ monitoring: alert on every event in the DLQ, dashboard with counts and message age
Reprocessing: tooling for replaying from the DLQ back to the source topic after a bug fix
Poisonous message isolation: identifying events that would crash the consumer and isolating them without affecting others

Schema Registry: Avro vs Protobuf¶

The moment you have dozens of services producing and consuming events, you need a contract. Schema Registry ensures that producer and consumer agree on message format — and that schema evolution does not break existing consumers.

Apache Avro¶

Native integration with the Kafka ecosystem. Schema evolution with backward/forward compatibility. Compact binary format. Confluent Schema Registry as the standard. Ideal for Kafka-first architectures.

Protocol Buffers¶

Strong typing, excellent code generation for Java, Go, Python, TypeScript. Lower overhead for nested structures. Better tooling for gRPC. We prefer it for polyglot environments with multiple languages.

Our recommendation: if you are all-in on the Kafka ecosystem (Confluent Platform, Kafka Streams, ksqlDB), choose Avro. If you have a polyglot environment with gRPC communication between services, Protobuf makes more sense. In both cases, the critical step is to set the compatibility mode (BACKWARD or FULL) on Schema Registry — without it, schema evolution is a ticking time bomb.

Real-Time Analytics on the Event Stream¶

One of the greatest strengths of event-driven architecture is the ability to build real-time analytics directly over the event stream — with no batch ETL processes, no waiting overnight for reports.

The typical stack looks like this: Kafka as the event source → Kafka Streams or Apache Flink for stream processing (aggregation, windowing, enrichment) → ClickHouse or Apache Druid as the analytics database → Grafana or a custom dashboard for visualisation.

Practical use cases we implement:

Fraud detection: real-time transaction scoring with <100ms latency. Anomalies are detected in the stream, not in a batch report the next morning
Inventory tracking: current stock level as a materialised view over the event stream (received, shipped, returned)
Business KPIs: live dashboard with GMV, conversion rate, average order value — updated every second, not every day
Alerting: automatic anomaly detection — a 30% drop in orders within an hour, a spike in error rate, unusual payment patterns

How We Build It at CORE SYSTEMS¶

Event-driven architecture is not something you switch on overnight. It is a transformational journey that requires a shift in both mindset and tooling. At CORE SYSTEMS, we have a proven approach for guiding clients through this transformation.

We start with an event storming workshop — a technique where we map business processes as sequences of events together with domain experts. The output is an event model: what events exist, who produces them, who consumes them and what the dependencies are. This is the foundation on which we build the technical architecture.

Then we choose the strangler fig pattern for gradual migration from the monolith. We do not do big-bang rewrites. Instead, we identify the bounded context with the greatest business value as a standalone service, extract it, connect it via Kafka, and only once it runs in production do we continue with the next.

Our standard stack for event-driven systems includes: Apache Kafka (Confluent Platform or Amazon MSK), Schema Registry with Avro/Protobuf, Kafka Streams or Flink for stream processing, PostgreSQL or MongoDB for materialised views, and OpenTelemetry with distributed tracing for observability across the entire event flow.

Every system includes from the start: a DLQ with monitoring, schema compatibility checks in CI/CD, end-to-end tracing via correlation ID, and a runbook for operators — because an event-driven system that nobody knows how to operate is worse than a monolith.

Conclusion: Events Are the New Source of Truth¶

Event-driven architecture changes how we think about data and communication between services. Instead of shared databases, we have shared events. Instead of synchronous chains, we have asynchronous reactive systems. Instead of “current state,” we have complete history.

But be careful: EDA is not a silver bullet. It brings its own complexity — eventual consistency, distributed debugging, schema management, ordering guarantees. The key to success is a pragmatic approach: start with one bounded context, prove the value, then scale. Exactly how we do it at CORE SYSTEMS.

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.