How DM Synchronizes Multiple Databases to TiDB: Implementing DML to Only Delete Records from the Current Database and the Impact of DDL on Data in Other Databases

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM 多个库同步至TIDB DML是如何实现仅删除本库记录 DDL影响其他库数据

| username: tjdagang1

【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】TIDB v5.4.0 DMv2.0.1
【Problem Encountered】Consulting about DM synchronizing multiple databases to TiDB. When the upstream source database executes delete table, only the data from that specific database is deleted in the TiDB table, while the synchronized data from other databases is not deleted. What is the underlying principle? Is it through transaction ID? When DDL truncate is executed upstream, TiDB data synchronization affects data from other databases.
【Reproduction Path】What operations were performed to encounter the problem
【Problem Phenomenon and Impact】For example, MYSQL A database and MYSQL B database both have table test_dm synchronizing data to TiDB C database test_dm. The DM synchronization method is all without event filtering.
A database

create table test_dm
(
id int,
name varchar(20),
port varchar(20) default '3306',
primary key (id)
);

B database

create table test_dm
(
id int,
name varchar(20),
port varchar(20) default '3307',
primary key (id)
);

C database

create table test_dm
(
id int,
name varchar(20),
port varchar(20) default '3308')
;

A database

MySQL [test_db]> insert into test_dm(id,name) values(1,'AA'),(2,'BB');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

MySQL [test_db]> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  1 | AA   | 3306 |
|  2 | BB   | 3306 |
+----+------+------+
2 rows in set (0.00 sec)

B database

mysql> insert into test_dm(id,name) values(1,'AA'),(2,'BB');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  1 | AA   | 3307 |
|  2 | BB   | 3307 |
+----+------+------+
2 rows in set (0.00 sec)

C database (all data synchronized)

MySQL [test_db]> select * from test_dm;
+------+------+------+
| id   | name | port |
+------+------+------+
|    1 | AA   | 3306 |
|    2 | BB   | 3306 |
|    2 | BB   | 3307 |
|    1 | AA   | 3307 |
+------+------+------+
4 rows in set (0.00 sec)

A database

MySQL [test_db]> delete from test_dm where id=1;
Query OK, 1 row affected (0.00 sec)

MySQL [test_db]> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  2 | BB   | 3306 |
+----+------+------+
1 row in set (0.01 sec)

C database (only data with id=1 from A database is deleted)

MySQL [test_db]> select * from test_dm;
+------+------+------+
| id   | name | port |
+------+------+------+
|    2 | BB   | 3306 |
|    2 | BB   | 3307 |
|    1 | AA   | 3307 |
+------+------+------+
3 rows in set (0.00 sec)

B database

mysql> delete from test_dm where id=2;
Query OK, 1 row affected (0.00 sec)

mysql> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  1 | AA   | 3307 |
+----+------+------+
1 row in set (0.00 sec)

C database (only data with id=2 from B database is deleted)

MySQL [test_db]> select * from test_dm;
+------+------+------+
| id   | name | port |
+------+------+------+
|    2 | BB   | 3306 |
|    1 | AA   | 3307 |
+------+------+------+
2 rows in set (0.00 sec)

B database

mysql> truncate table test_dm;
Query OK, 0 rows affected (0.00 sec)

C database (all data cleared)

mysql> select * from test_dm;
Empty set (0.00 sec)

【Attachments】

  • Relevant logs, configuration files, Grafana monitoring (https://metricstool.pingcap.com/)
  • TiUP Cluster Display information
  • TiUP Cluster Edit config information
  • TiDB-Overview monitoring
  • Corresponding module Grafana monitoring (if any, such as BR, TiDB-binlog, TiCDC, etc.)
  • Corresponding module logs (including logs one hour before and after the issue)

If the question is related to performance optimization or troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output.

| username: xfworld | Original post link

When replaying binlogs, they are replayed according to the state at the time of data deletion and in order.

For example:
There are databases A, B, and C, which are merged into TiDB database E, and there are no conflicts in the unique identifiers after merging.
If a delete operation is performed on table XX in database C, the binlog of database C will record the actual delete record information of table XX. DM will receive these records and initiate corresponding operations on TiDB.

This is my understanding, and I hope it helps you.

Here are the reference documents:

| username: 杨小柯南 | Original post link

Refer to this and give it a try: DM 任务完整配置文件介绍 | PingCAP 文档中心
You can set filter conditions for different databases, and you can also configure multiple data sources.

| username: tjdagang1 | Original post link

Thank you very much. My initial description might have had some issues. I have now added some examples. Please help me review them.

| username: Hacker007 | Original post link

Each row in the table has a unique row key to identify a unique piece of data in the distributed database.

| username: xiaohetao | Original post link

:+1::+1::+1:

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.