How to Dynamically Add or Remove Blacklist and Whitelist Filters in TiDB Data Migration?

translator_bot · June 22, 2024, 12:39pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB Data Migration 黑白名单过滤，如何动态增减？

| username: love-cat

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.2.2
[Reproduction Path] Operations performed that led to the issue
I used DM to perform incremental data synchronization from MySQL to TiDB, and at that time, I filtered some tables using a blacklist. Now I want to remove some filtered tables:
block-allow-list:
ba-rule1:
do-dbs: [“thc_1125_abcd”]
ignore-tables:
- db-name: “thc_1125_abcd”
tbl-name: “ACT_"
- db-name: “thc_1125_abcd”
tbl-name: "QRTZ_”

[Encountered Issue: Issue Phenomenon and Impact]
Issue: How can I dynamically remove some tables from the blacklist, for example, if I want tables starting with ACT_ to participate in synchronization, what should I do?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 22, 2024, 12:39pm

| username: love-cat | Original post link

How to dynamically remove an entry from the blacklist, tested the following method and failed:

Modify task.yaml
Stop task
Start task with the new yaml

Does anyone have a better method?

translator_bot · June 22, 2024, 12:39pm

| username: love-cat | Original post link

I saw the official documentation, it feels very cumbersome, deploying, deleting, and recreating synchronization: Data Migration 常见问题 | PingCAP 文档中心

translator_bot · June 22, 2024, 12:39pm

| username: CuteRay | Original post link

Start several more DM-workers, control the task granularity to the smallest, stop the task if synchronization is not needed, and start a new task if needed.

translator_bot · June 22, 2024, 12:39pm

| username: love-cat | Original post link

In our upstream database instance, there are many databases, but we only synchronize a specific one. If we start multiple workers, we will need to fetch multiple full binlogs.

translator_bot · June 22, 2024, 12:39pm

| username: 特雷西-迈克-格雷迪 | Original post link

What error is reported?

translator_bot · June 22, 2024, 12:39pm

| username: love-cat | Original post link

There is no error log, and the official says it doesn’t work either. However, the official steps are a bit hard to understand.

translator_bot · June 22, 2024, 12:39pm

| username: love-cat | Original post link

Can someone please explain the document?

translator_bot · June 22, 2024, 12:39pm

| username: okenJiang | Original post link

The document means that you need to start a new full task to perform a full migration for your newly added table separately.

Why is this necessary? First, you need to understand that binlog is a stream. The DM task you are currently running has already reached an intermediate state, such as binlog999. At this point, if you add a new table, table1, that needs to be synchronized, it will not backtrack to the binlogs before binlog999. However, before this, your table1 definitely had data, and this previous data would be lost. Therefore, you need to start a new full task.

You can re-read the document for better understanding. If you have any suggestions for improving the document, feel free to propose them directly.

translator_bot · June 22, 2024, 12:39pm

| username: love-cat | Original post link

Okay, thank you.

translator_bot · June 22, 2024, 12:39pm

| username: love-cat | Original post link

Hello, are the steps roughly correct?

Existing synchronization task information:
Task name: thc_1234_dev
Corresponding data source: thc_1234_dev
Blacklist configuration: Do not synchronize tables starting with thc_data_*
Requirement:
I now want to synchronize tables starting with thc_data_*. The required steps are:
1. Create a new synchronization task: thc_1234_dev_thc_data, with thc_1234_dev_thc_data corresponding to the data source: thc_1234_dev, and configure a whitelist to only synchronize tables starting with thc_data_*
2. Stop the tasks thc_1234_dev and thc_1234_dev_thc_data and record: binlog_name, binlog_pos
3. Modify the thc_1234_dev task yaml to add the following content:
  syncers:
  global:
  worker-count: 16
  batch: 100
  enable-ansi-quotes: true
  safe-mode: true
  safe-mode-duration: “60s”
  compact: false
  multiple-rows: false
4. Modify thc_1234_dev’s binlog_name and binlog_pos to the minimum values from step 2
5. Start the thc_1234_dev task
6. Check if syncerBinlog is greater than binlog_name and binlog_pos, then remove safe-mode: true and restart the task
7. Can the thc_1234_dev_thc_data task be deleted?

translator_bot · June 22, 2024, 12:39pm

| username: Min_Chen | Original post link

Hello,

There is no need to create another task, you can just add it directly.

translator_bot · June 22, 2024, 12:39pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.