Debezium captures database changes in real time. Every INSERT, UPDATE and DELETE is transferred to Kafka without loading source DB.
Change Data Capture¶
CDC reads transaction log (WAL, binlog) — doesn’t load source DB.
{
"name": "postgres-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.dbname": "app",
"topic.prefix": "cdc",
"table.include.list": "public.orders",
"plugin.name": "pgoutput",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
}
}
CDC → Data Lake¶
- Debezium → Kafka
- Flink/Spark → processing
- Delta/Iceberg/Hudi → upsert
Summary¶
Debezium is standard for CDC in Kafka ecosystem. Near real-time replication without source DB load.
debeziumcdcreplicationkafka connect