Does DM support synchronization from TiDB to TiDB?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM支持从tidb同步到tidb吗?

| username: codingBoyYe

The historical TiDB table has 1.4 billion rows of data. Now, I want to change the table to a hash partitioned table. According to the official documentation, I found that I can only create a new partitioned table and manually synchronize the data. I thought of a method using select insert, but I feel it might have an impact. So, I am asking if DM supports synchronization from TiDB to TiDB or if there is a safer way to synchronize the data.

| username: wisdom | Original post link

DM supports data migration from databases compatible with the MySQL protocol (MySQL, MariaDB, Aurora MySQL) to TiDB.

| username: sunzhaoyang | Original post link

Not supported, consider TiCDC.

| username: ddhe9527 | Original post link

DM connects to the upstream database as a slave using the CHANGE MASTER method, meaning the upstream database must support the MySQL master-slave protocol, which TiDB does not support.

| username: tjdagang1 | Original post link

Not supported. You can consider CDC or TIDBBINLOG.

| username: hey-hoho | Original post link

DM is disguised as a MySQL Slave to parse binlogs, so the upstream cannot be TiDB.

| username: codingBoyYe | Original post link

After some research, it seems that CDC does not support full data synchronization.

| username: codingBoyYe | Original post link

Got it, is there any good solution?

| username: ddhe9527 | Original post link

If the upstream is MySQL and the downstream is TiDB, use DM for synchronization.
If the upstream is TiDB and the downstream is MySQL/TiDB/Kafka, use TiCDC or TiDB-Binlog.

| username: codingBoyYe | Original post link

I understand that TICDC and TIDB-Binlog only synchronize incremental data, so how do you synchronize the full data?

| username: ddhe9527 | Original post link

Full data can be transferred using BR or dumpling + lighting tools. BR and dumpling will include a timestamp during the backup. Then, use TiCDC or TiDB-Binlog for incremental synchronization. The starting point for incremental synchronization, --start_ts, should be the backup timestamp. It is necessary to ensure that --start_ts is not earlier than the GC Safepoint.

| username: codingBoyYe | Original post link

Thank you, bro. I tried using BR to synchronize data, but after partitioning the new table, the table structure changed. The primary key changed from a single ID field to ID + hash partition field, and BR full synchronization indicated a structure mismatch.
Additionally, I suspect that using Dumpling + Lightning for data export/import involves inserting data row by row with insert statements. If that’s the case, it might be faster for me to write a program to batch insert, as there are 1.4 billion rows of historical data.

| username: 啦啦啦啦啦 | Original post link

The Local-backend import mode of Lightning does not insert single rows with insert statements, which is relatively fast.
https://docs.pingcap.com/zh/tidb/stable/tidb-lightning-backends#tidb-lightning-后端

| username: ddhe9527 | Original post link

Lightning is not inserted; it has three backend modes. When you use the Local-backend mode, it directly ingests into TiKV, and the speed is not slow. Moreover, this is not just a matter of speed. You need to define a starting timestamp for CDC. The version of the data you script is not on the same timestamp unless you insert it in the same transaction. TiDB has a size limit for a single transaction, and it is impossible to insert 1.4 billion data in one transaction.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.