Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: DM支持从tidb同步到tidb吗?
The historical TiDB table has 1.4 billion rows of data. Now, I want to change the table to a hash partitioned table. According to the official documentation, I found that I can only create a new partitioned table and manually synchronize the data. I thought of a method using select insert, but I feel it might have an impact. So, I am asking if DM supports synchronization from TiDB to TiDB or if there is a safer way to synchronize the data.
DM supports data migration from databases compatible with the MySQL protocol (MySQL, MariaDB, Aurora MySQL) to TiDB.
Not supported, consider TiCDC.
DM connects to the upstream database as a slave using the CHANGE MASTER method, meaning the upstream database must support the MySQL master-slave protocol, which TiDB does not support.
Not supported. You can consider CDC or TIDBBINLOG.
DM is disguised as a MySQL Slave to parse binlogs, so the upstream cannot be TiDB.
After some research, it seems that CDC does not support full data synchronization.
Got it, is there any good solution?
If the upstream is MySQL and the downstream is TiDB, use DM for synchronization.
If the upstream is TiDB and the downstream is MySQL/TiDB/Kafka, use TiCDC or TiDB-Binlog.
I understand that TICDC and TIDB-Binlog only synchronize incremental data, so how do you synchronize the full data?
Full data can be transferred using BR or dumpling + lighting tools. BR and dumpling will include a timestamp during the backup. Then, use TiCDC or TiDB-Binlog for incremental synchronization. The starting point for incremental synchronization, --start_ts, should be the backup timestamp. It is necessary to ensure that --start_ts is not earlier than the GC Safepoint.
Thank you, bro. I tried using BR to synchronize data, but after partitioning the new table, the table structure changed. The primary key changed from a single ID field to ID + hash partition field, and BR full synchronization indicated a structure mismatch.
Additionally, I suspect that using Dumpling + Lightning for data export/import involves inserting data row by row with insert statements. If that’s the case, it might be faster for me to write a program to batch insert, as there are 1.4 billion rows of historical data.
The Local-backend import mode of Lightning does not insert single rows with insert statements, which is relatively fast.
https://docs.pingcap.com/zh/tidb/stable/tidb-lightning-backends#tidb-lightning-后端
Lightning is not inserted; it has three backend modes. When you use the Local-backend mode, it directly ingests into TiKV, and the speed is not slow. Moreover, this is not just a matter of speed. You need to define a starting timestamp for CDC. The version of the data you script is not on the same timestamp unless you insert it in the same transaction. TiDB has a size limit for a single transaction, and it is impossible to insert 1.4 billion data in one transaction.
This topic was automatically closed 1 minute after the last reply. No new replies are allowed.