Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TiCDC只能同步增量数据到kafka。。之前的全量数据该如何处理

TiCDC can only synchronize incremental data to Kafka. How should the previous full data be handled?
Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TiCDC只能同步增量数据到kafka。。之前的全量数据该如何处理
TiCDC can only synchronize incremental data to Kafka. How should the previous full data be handled?
Wouldn’t it be better to use tools like Dumpling to export the data in CSV format for downstream?
Is there an existing tool for downstream heterogeneous databases (ES) without R&D participation and without developing tools?
There are quite a few data synchronization tools that support heterogeneous environments.
If it’s isomorphic, you can consider using CloudCanal.
It seems like you want to do heterogeneous migration and replace TiDB
Kafka cannot replace TiDB. If you are switching to another database, you don’t need Kafka as an intermediary.
Use dumpling or other tools to initialize the data.
First, perform a full export to the target database, then use CDC for incremental updates.
You can consider using DataX. I have done historical data migration for enterprise-level databases. For TiDB, you can use DataX’s MySQL plugin, as TiDB supports MySQL statements.
Use BR for full backup, and there will be a backup timestamp during the backup. Then, when creating the CDC synchronization task, specify this timestamp as the start-ts to continue data synchronization from the time of the full backup.
It mainly depends on the amount of data. If the data volume is small, you can also update a specific field, such as the update time, to trigger data synchronization.