How to Change DM Full Synchronization to Incremental Synchronization

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM全量同步,怎样改为增量同步

| username: TiDBer_OB4kHrS7

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】V7.5
【Reproduction Path】Currently using DM to synchronize from upstream MySQL to TiDB, full synchronization is occupying disk space. Now, full synchronization is not needed, only incremental synchronization is required. Can the current synchronization task be stopped, the target end data cleared, and then start synchronizing incremental data from the current GTID position?
【Encountered Problem: Problem Phenomenon and Impact】
【Resource Configuration】Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】

| username: ziptoam | Original post link

Actually, you can retain the incremental synchronization logic and periodically delete the expired backup files.

| username: zhaokede | Original post link

You can just add an incremental synchronization task.

| username: TiDBer_OB4kHrS7 | Original post link

How to delete expired backup files? Should I directly delete the data in the table?

| username: TiDBer_OB4kHrS7 | Original post link

When adding an incremental synchronization task, should the existing data in the table be directly cleared?

| username: ziptoam | Original post link

I haven’t practiced it either. See if this document can help you:
TiDB Log Backup and PITR Guide | PingCAP Documentation Center

| username: zhaokede | Original post link

Of course, it won’t clear. Incremental synchronization will only sync the latest data.

| username: Kamner | Original post link

If you only perform incremental synchronization, data inconsistencies may occur and result in errors. You still need to consider what your requirements are.

For example:
Updating a row before incremental synchronization may result in an error due to missing data.

| username: ziptoam | Original post link

Is this useful?

| username: zhaokede | Original post link

First, there will be full synchronization, and later it will be changed to incremental synchronization. The data should theoretically be consistent.

| username: okenJiang | Original post link

Is your DM task configuration file set to mode: full?

  • If it is full, then the full import is one-time, and after the task is completed, it will be gone. If you want to start an incremental task later, you can just start it directly.
  • If it is all, then after the full import ends, it will automatically start incremental synchronization without any other operations.

DM 任务完整配置文件介绍 | PingCAP 文档中心
task-mode: all # Task mode, can be set to “full” - “only perform full data migration”, “incremental” - “Binlog real-time synchronization”, “all” - “full + Binlog real-time synchronization”

| username: TiDBer_OB4kHrS7 | Original post link

I am currently using full + incremental synchronization. To save hard disk space, I only need incremental synchronization. If I don’t clear the original data, the goal won’t be achieved, and there’s no need to change anything else.

| username: TiDBer_OB4kHrS7 | Original post link

This might indeed exist. Without the previous data, updates will indeed report errors. It depends on the requirements and how long ago the data being updated is.

| username: 有猫万事足 | Original post link

Your initial approach is roughly correct.

Modify the task configuration to incremental. Stop the task, delete the target data, and then start the task again.

| username: TiDBer_OB4kHrS7 | Original post link

However, there is a problem. As someone mentioned earlier, after deleting the target-end data, if there are delete and update operations later and the target-end data does not exist, the DM task will stop here.

| username: 有猫万事足 | Original post link

Enabling safe mode can solve this problem.

| username: TiDBer_OB4kHrS7 | Original post link

Okay, I’ll test it another day.

| username: YuchongXU | Original post link

No, there should be basic data of level 8.