Debezium — Change Data Capture for Real-time Replication

Debezium captures database changes in real time. Every INSERT, UPDATE and DELETE is transferred to Kafka without loading source DB.

Change Data Capture¶

CDC reads transaction log (WAL, binlog) — doesn’t load source DB.

{
  "name": "postgres-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.dbname": "app",
    "topic.prefix": "cdc",
    "table.include.list": "public.orders",
    "plugin.name": "pgoutput",
    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
  }
}

CDC → Data Lake¶

Debezium → Kafka
Flink/Spark → processing
Delta/Iceberg/Hudi → upsert

Production Best Practices¶

In production environments, use Debezium with Kafka Connect in distributed mode for high availability. Set snapshot.mode according to your needs — initial for the first full database synchronization, schema_only if you only need new changes. The ExtractNewRecordState transform simplifies message structure from envelope format to flat JSON.

Monitor lag between the source database and Kafka topics using Debezium metrics. For performance issues, consider column filtering with column.include.list — transfer only the data you actually need. Be careful with schema migrations — ALTER TABLE on the source DB requires restarting the connector with the new schema. Debezium supports PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, and other databases.

Summary¶

Debezium is standard for CDC in Kafka ecosystem. Near real-time replication without source DB load.

debeziumcdcreplicationkafka connect

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

All articles

Debezium — Change Data Capture for Real-time Replication

Change Data Capture¶

CDC → Data Lake¶

Production Best Practices¶

Summary¶

CORE SYSTEMS team

More know-how

PostgreSQL Replication

Kafka Connect — System Integration Without Code

Apache Iceberg — Open Table Format for Data Lake

Apache Kafka — Distributed Streaming Platform