Add New Tables to Existing Migration Tasks in DM Synchronization

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM同步为已有迁移任务增加新同步的表

| username: Billmay表妹

Thanks to @seiang for the contribution.

DM is a convenient data migration tool that supports full data migration and incremental data synchronization from databases compatible with the MySQL protocol (MySQL, MariaDB, Aurora MySQL) to TiDB. Using the DM tool helps simplify the data migration process and reduce the operational costs of data migration.

DM not only plays an important role in the migration from MySQL to TiDB but also in near real-time synchronization of upstream MySQL data to the TiDB cluster for complex AP queries.

In actual data synchronization scenarios, it is often encountered that an existing data synchronization task is running, but new tables from the upstream MySQL need to be added to the migration task. This article discusses how to add new tables to an existing DM synchronization task.

Below is the official documentation for handling different stages of an existing data migration task:

1. Migration Task Currently in the Dump Stage

Since MySQL does not support specifying a snapshot for export, it does not support updating the migration task and restarting to continue the export from a breakpoint during the export process. Therefore, it is not possible to dynamically add tables to be migrated at this stage.

If additional tables need to be added for migration, it is recommended to restart the migration task with a new configuration file.

2. Migration Task Currently in the Load Stage

Different data migration tasks usually correspond to different binlog positions during export. If they are merged and imported during the Load stage, it is impossible to achieve consistency in binlog positions. Therefore, it is not recommended to add tables to be migrated during the Load stage.

3. Migration Task Currently in the Sync Stage

When the data migration task is already in the Sync stage, adding additional tables in the configuration file and restarting the task will not cause DM to re-execute the full export and import for the new tables. Instead, it will continue incremental replication from the previous breakpoint.

Therefore, if the full data of the new tables has not been imported into the downstream, a separate data migration task needs to be used to export and import the full data of the new tables into the downstream.

Record the position information of the global checkpoint (is_global=1) corresponding to the existing migration task as checkpoint-T, such as (mysql-bin.000100, 1234). Record the position information of the full export metadata (or the checkpoint of another data migration task in the Sync stage) of the tables to be added as checkpoint-S, such as (mysql-bin.000099, 5678). The tables can be added to the migration task through the following steps:

  • The checkpoint table to be updated is {dm_meta} library’s {task-name}_syncer_checkpoint.
  • The checkpoint row to be updated is id={source-id} and is_global=1.
  • The checkpoint columns to be updated are binlog_name and binlog_pos.
  1. Use stop-task to stop the existing migration task. If the tables to be added belong to another running migration task, stop it as well.
  2. Use a MySQL client to connect to the downstream TiDB database and manually update the checkpoint information of the existing migration task to the smaller value between checkpoint-T and checkpoint-S (in this example, (mysql-bin.000099, 5678)).
  3. Set safe-mode: true in the syncers section of the migration task configuration to ensure idempotent execution.
  4. Start the migration task using start-task.
  5. Observe the migration task status using query-status. When syncerBinlog exceeds the larger value between checkpoint-T and checkpoint-S (in this example, (mysql-bin.000100, 1234)), restore safe-mode to its original value and restart the migration task.

Below is a practical case of adding new tables to an existing data synchronization task based on the official documentation and actual business scenarios:

Requirement:

Currently, a certain business database already has a DM synchronization task synchronizing data from upstream MySQL to the TiDB cluster. The synchronization task involves 56 tables. Due to business iteration, two new tables (user2gameapp, users_info) need to be synchronized from upstream MySQL to the TiDB cluster.

The existing DM synchronization task configuration file is as follows:

name: "tidb-xxx-task"
case-sensitive: true
meta-schema: "dm_meta"
task-mode: "all"

target-database:
  host: "10.xx.xx.xx"
  port: 4000
  user: "syncuser"
  password: "xxxxxxxxxxxx"

mysql-instances:
  -
   source-id: "mysql-xxx"
   block-allow-list: "global"
   route-rules: ["mysql-xxx-rule"]     
   mydumper-config-name: "global"        
   loader-config-name: "global"           
   syncer-config-name: "global"            

block-allow-list:
  global:                            
    do-dbs: ["RTXXX"]                          
    do-tables:
    - db-name: "RTXXX"   
      tbl-name: "xxx1_game"
    - db-name: "RTXXX"   
      tbl-name: "xxx2_game"
    - db-name: "RTXXX"   
      tbl-name: "xxx_ad"
   .....

routes:
    mysql-xxx-rule:
      schema-pattern: "RTXXX"
      target-schema: "RTXXX_TiDB"
        
mydumpers:
  global:                           
    threads: 4
        
loaders:                           
  global:
    pool-size: 16
        
syncers:                            
  global:                            
    worker-count: 16
    batch: 100

Operation Process

1. Export the two new tables to be synchronized from upstream MySQL using dumpling

# dumpling -h 10.xx.xx.xx -P 3306 -uxxxx -p --filetype sql -t 8 -F 64MiB -B RTXXX -T $(cat /data/xx.lst)  -o /data/RTXXX

2. Check the metadata record of the exported SQL file, which will be used later to modify the global checkpoint

# cat metadata
Started dump at: 2022-04-28 12:19:43
SHOW MASTER STATUS:
        Log: mysqlbin.025306
        Pos: 615988532
        GTID:27a0991c-b504-11ea-9bc4-005056b73203:1-5220,
c492d887-b503-11ea-8362-005056b7b179:1-1665110007

SHOW SLAVE STATUS:
        Host: 10.xx.xx.xx
        Log: mysqlbin.025403
        Pos: 616318608
        GTID:27a0991c-b504-11ea-9bc4-005056b73203:1-5220,
c492d887-b503-11ea-8362-005056b7b179:1-1665110007
Finished dump at: 2022-04-28 12:19:44

3. Import the exported SQL file into the TiDB cluster using tidb-lightning

tidb-lightning configuration file:

# cat tidb-lightning.toml
[lightning]
level = "info"
file = "/xxx/tidb-lightning.log"
pprof-port = 8289
status-addr = '0.0.0.0:8289'

[tikv-importer]
backend = "tidb"
on-duplicate = "error"

[mydumper]
read-block-size = 65536
batch-size = 1262485504
data-source-dir = "/data/RTXXX"
no-schema = false

filter = ['*.*', '!mysql.*', '!sys.*', '!INFORMATION_SCHEMA.*', '!PERFORMANCE_SCHEMA.*', '!METRICS_SCHEMA.*', '!INSPECTION_SCHEMA.*']

[tidb]
host = "10.xx.xx.xx"
port = 4000
user = "syncuser"
password = "xxxxxxx"
status-port = 10080
pd-addr = "10.xx.xx.xx:2379"

[checkpoint]
enable = true

Import data:

# nohup tidb-lightning -config /xxx/tidb-lightning.toml > nohup.out &

4. After the data import is complete, stop the existing synchronization task

$ tiup dmctl --master-addr 10.xx.xx.xx:8261 stop-task tidb-xxx-task   
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.4/dmctl/dmctl --master-addr 10.xx.xx.xx:8261 stop-task tidb-xxx-task 
{
    "op": "Stop",
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "source": "mysql-xxx",
            "worker": "dm-10.xx.xx.xx-8262"
        }
    ]
}

5. Modify the {dm_meta} library’s {task-name}_syncer_checkpoint

mysql> select * from `tidb-xxx-task_syncer_checkpoint` where is_global=1\G
*************************** 1. row ***************************
                   id: mysql-xxx
            cp_schema:
             cp_table:
          binlog_name: mysqlbin.025306
           binlog_pos: 720283200
          binlog_gtid: 27a0991c-b504-11ea-9bc4-005056b73203:1-5220,c492d887-b503-11ea-8362-005056b7b179:1-1665117758
exit_safe_binlog_name:
exit_safe_binlog_pos: 0
exit_safe_binlog_gtid:
           table_info: null
            is_global: 1
          create_time: 2022-04-28 11:32:31
          update_time: 2022-04-28 12:30:47
1 row in set (0.02 sec)

Modify the global checkpoint. If GTID is enabled, it also needs to be modified here.

mysql> update `tidb-xxx-task_syncer_checkpoint` set binlog_name='mysqlbin.025306',binlog_pos='615988532',binlog_gtid='27a0991c-b504-11ea-9bc4-005056b73203:1-5220,c492d887-b503-11ea-8362-005056b7b179:1-1665110007' where is_global=1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

6. Modify the DM synchronization task configuration file to add the two new tables

name: "tidb-xxx-task"
case-sensitive: true
meta-schema: "dm_meta"
task-mode: "all"

target-database:
  host: "10.xx.xx.xx"
  port: 4000
  user: "syncuser"
  password: "xxxxxxxxxxxx"

mysql-instances:
  -
   source-id: "mysql-xxx"
   block-allow-list: "global"
   route-rules: ["mysql-xxx-rule"]     
   mydumper-config-name: "global"        
   loader-config-name: "global"           
   syncer-config-name: "global"            

block-allow-list:
  global:                            
    do-dbs: ["RTXXX"]                          
    do-tables:
    - db-name: "RTXXX"   
      tbl-name: "xxx1_game"
    - db-name: "RTXXX"   
      tbl-name: "xxx2_game"
    - db-name: "RTXXX"   
      tbl-name: "xxx_ad"
    - db-name: "RTXXX"
      tbl-name: "user2gameapp"
    - db-name: "RTXXX"
      tbl-name: "users_info"
   .....

routes:
    mysql-xxx-rule:
      schema-pattern: "RTXXX"
      target-schema: "RTXXX_TiDB"
        
mydumpers:
  global:                           
    threads: 4
        
loaders:                           
  global:
    pool-size: 16
        
syncers:                            
  global:                            
    worker-count: 16
    batch: 100

7. Restart the synchronization task

$ tiup dmctl --master-addr 10.xx.xx.xx:8261 start-task ./task_xxx.yaml
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.4/dmctl/dmctl --master-addr 10.xx.xx.xx:8261 start-task ./task_xxx.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "source": "mysql-xxx",
            "worker": "dm-10.xx.xx.xx-8262"
        }
    ]
}

8. Check the synchronization task status

$ tiup dmctl --master-addr 10.xx.xx.xx:8261 query-status tidb-xxx-task
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.4/dmctl/dmctl --master-addr 10.xx.xx.xx:8261 query-status tidb-xxx-task
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "mysql-xxx",
                "worker": "dm-10.xx.xx.xx-8262",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
                {
                    "name": "tidb-xxx-task",
                    "stage": "Running",
                    "unit": "Sync",
                    "result": null,
                    "unresolvedDDLLockID": "",
                    "sync": {
                        "totalEvents": "292390",
                        "totalTps": "9702",
                        "recentTps": "9702",
                        "masterBinlog": "(mysqlbin.025307, 28133809)",
                        "masterBinlogGtid": "27a0991c-b504-11ea-9bc4-005056b73203:1-5220,c492d887-b503-11ea-8362-005056b7b179:1-1665124398",
                        "syncerBinlog": "(mysqlbin.025307, 27748593)",
                        "syncerBinlogGtid": "27a0991c-b504-11ea-9bc4-005056b73203:1-5220,c492d887-b503-11ea-8362-005056b7b179:1-1665124326",
                        "blockingDDLs": [
                        ],
                        "unresolvedGroups": [
                        ],
                        "synced": false,
                        "binlogType": "remote",
                        "secondsBehindMaster": "0"
                    }
                }
            ]
        }
    ]
}

At this point, the operation of adding new tables to an existing DM synchronization task is complete.

(Due to the author’s limited ability, if there are any technical or descriptive errors in the article, please correct them in time. Thank you very much!)