Update the task configuration file or resynchronize the filtered database?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 重新更新任务配置文件,或者重新同步过滤的库?

| username: TIDB_入门学者

[TiDB Usage Environment] Test Environment
[TiDB Version] 6.3
[Encountered Issue]
task-mode: all
Scenario:

  1. The original data migration task configuration filters a specific table in a specific database, and it has been running normally for a while.
  2. Now I want to resynchronize the previously filtered table.
  3. How should I configure the update in a task configuration file, or what steps support this scenario?

Test:
I used the command tiup dmctl --master-addr xxx start-task --remove-meta ./task.yaml
It seems that using this command can synchronize the previously ignored table. What impact does this command have on subsequent synchronization?

| username: jansu-dev | Original post link

  1. Using this command seems to synchronize previously ignored tables. What impact does this command have on subsequent synchronization?
    a. First, the all mode needs to go through different stages: dump (exporting schema and table data from upstream), load (initializing data downstream), and sync (continuously synchronizing binlog).
    b. The checkpoint and meta-schema stored in the downstream database (not updated in real-time), if the checkpoint is forcibly removed, remove-meta, the task will restart dump/sync, overwrite (uncleared) downstream data; if the data volume is small, it is not a big issue, but it means additional steps, and depending on machine performance and data volume, the delay may vary.


  2. How to configure in a task configuration file to update, or what steps support this scenario?
    a. Someone has asked the same question before 请问修改了DM的task.yaml文件,如何重启生效? - TiDB 的问答社区
    b. To implement the function of ignoring a certain table → Block & Allow Table Lists → 主要特性 | PingCAP 文档中心
    c. In summary, using Block & Allow Table Lists to restart the task should be able to ignore the table.

| username: Hacker007 | Original post link

This command should delete the metadata table and then resynchronize it. The prerequisite is also to clear the downstream table.

| username: dba-kit | Original post link

Here is the operation process I follow:

  1. Create a new task specifically to synchronize that table.
  2. Wait for the table synchronization to complete and find the low peak period for changes to that table.
  3. Ensure that neither task has any delay, then stop the full synchronization task first, followed by stopping the new table synchronization task (this ensures that the GTID executed by the full task is less than that of the new table, preventing any changes to the new table from being lost).
  4. Modify the configuration of the full synchronization task and start the full synchronization task (when starting/resuming the task, DM will automatically enable safe_mode, so theoretically, even if there are changes to the new table during the two stop-task periods, it can still be synchronized normally).
| username: dba-kit | Original post link

The standard operating steps can be found in the official documentation, but you need to manually modify the metadata.