Debezium captures database changes in real time. Every INSERT, UPDATE and DELETE is transferred to Kafka without loading source DB.
Change Data Capture¶
CDC reads transaction log (WAL, binlog) — doesn’t load source DB.
{
"name": "postgres-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.dbname": "app",
"topic.prefix": "cdc",
"table.include.list": "public.orders",
"plugin.name": "pgoutput",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
}
}
CDC → Data Lake¶
- Debezium → Kafka
- Flink/Spark → processing
- Delta/Iceberg/Hudi → upsert
Production Best Practices¶
In production environments, use Debezium with Kafka Connect in distributed mode for high availability. Set snapshot.mode according to your needs — initial for the first full database synchronization, schema_only if you only need new changes. The ExtractNewRecordState transform simplifies message structure from envelope format to flat JSON.
Monitor lag between the source database and Kafka topics using Debezium metrics. For performance issues, consider column filtering with column.include.list — transfer only the data you actually need. Be careful with schema migrations — ALTER TABLE on the source DB requires restarting the connector with the new schema. Debezium supports PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, and other databases.
Summary¶
Debezium is standard for CDC in Kafka ecosystem. Near real-time replication without source DB load.