DM MySQL Multiple Databases Sync to TiDB: DML Only Affects Local Database Data, DDL Affects All Data

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM MySQL多个库同步至TIDB DML仅影响本库数据 DDL影响所有数据

| username: tjdagang1

【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】TIDB v5.4.0 DM v2.0.1
【Problem Encountered】Using DM to synchronize MySQL to TiDB. The delete statement executed on the upstream MySQL only affects the synchronized data of the current database, while DDL statements affect all statements. Please help explain the principle or provide official documentation.
【Reproduction Path】What operations were performed to encounter the problem
【Problem Phenomenon and Impact】
For example, both MySQL A and MySQL B databases have a table test_dm that synchronizes data to TiDB C database test_dm. The DM synchronization method is all without event filtering.
The result is the same whether the table has a primary key or not.

A Database

create table test_dm
(
id int,
name varchar(20),
port varchar(20) default '3306',
primary key (id)
);

B Database

create table test_dm
(
id int,
name varchar(20),
port varchar(20) default '3307',
primary key (id)
);

C Database

create table test_dm
(
id int,
name varchar(20),
port varchar(20) default '3308')
;

A Database

MySQL [test_db]> insert into test_dm(id,name) values(1,'AA'),(2,'BB');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

MySQL [test_db]> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  1 | AA   | 3306 |
|  2 | BB   | 3306 |
+----+------+------+
2 rows in set (0.00 sec)

B Database

mysql> insert into test_dm(id,name) values(1,'AA'),(2,'BB');
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  1 | AA   | 3307 |
|  2 | BB   | 3307 |
+----+------+------+
2 rows in set (0.00 sec)

C Database (all data synchronized)

MySQL [test_db]> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  1 | AA   | 3306 |
|  2 | BB   | 3306 |
|  2 | BB   | 3307 |
|  1 | AA   | 3307 |
+----+------+------+
4 rows in set (0.00 sec)

A Database

MySQL [test_db]> delete from test_dm where id=1;
Query OK, 1 row affected (0.00 sec)

MySQL [test_db]> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  2 | BB   | 3306 |
+----+------+------+
1 row in set (0.01 sec)

C Database (only data with id=1 from A database is deleted)

MySQL [test_db]> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  2 | BB   | 3306 |
|  2 | BB   | 3307 |
|  1 | AA   | 3307 |
+----+------+------+
3 rows in set (0.00 sec)

B Database

mysql> delete from test_dm where id=2;
Query OK, 1 row affected (0.00 sec)

mysql> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  1 | AA   | 3307 |
+----+------+------+
1 row in set (0.00 sec)

C Database (only data with id=2 from B database is deleted)

MySQL [test_db]> select * from test_dm;
+----+------+------+
| id | name | port |
+----+------+------+
|  2 | BB   | 3306 |
|  1 | AA   | 3307 |
+----+------+------+
2 rows in set (0.00 sec)

B Database

mysql> truncate table test_dm;
Query OK, 0 rows affected (0.00 sec)

C Database (all data cleared)

mysql> select * from test_dm;
Empty set (0.00 sec)

【Attachments】

  • Relevant logs, configuration files, Grafana monitoring (https://metricstool.pingcap.com/)
  • TiUP Cluster Display information
  • TiUP Cluster Edit config information
  • TiDB-Overview monitoring
  • Corresponding module Grafana monitoring (if any, such as BR, TiDB-binlog, TiCDC, etc.)
  • Corresponding module logs (including logs one hour before and after the issue)

For questions related to performance optimization and fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output.

| username: buchuitoudegou | Original post link

Can you send the task configuration? Is shard-mode enabled for merging databases and tables migration (it will filter out truncate after enabling)?

FYI: 悲观模式下分库分表合并迁移 | PingCAP 文档中心

| username: tjdagang1 | Original post link

I would like to ask if DM records the upstream database information when loading data into TiDB. Does each record key contain upstream database information such as database name and row ID?

| username: buchuitoudegou | Original post link

Are you referring to the logs?

| username: tjdagang1 | Original post link

Does the DM write upstream information during the binlog conversion process or when generating TiDB key values?
Also, is there any official documentation explaining this?

| username: buchuitoudegou | Original post link

  1. If the database information and upstream information you mentioned refer to the schema, then this information will be recorded locally in DM and in the downstream TiDB (as a checkpoint); this information will be used to parse binlog, coordinate DDL synchronization, etc.
  2. If the key you mentioned refers to the key identifying rows in DM: DM will parse the primary key or unique key of the current binlog based on the schema information and use the primary key or unique key column of the row as the unique identifier. Then, based on the binlog, it will generate DML and finally let TiDB execute it (for example, if the upstream executes update xxx set a=NEW_VAL where b=10, DM will generate the DML based on the binlog of the changed row: update xxx set a=NEW_VAL where id=xxx …); therefore, the key here is just an ordinary unique key or primary key and does not contain additional information.
  3. If the key value you mentioned refers to the storage layer of TiDB, i.e., TiKV, storing key values: DM will not bypass TiDB to directly call TiKV’s interface. As for how TiKV encodes these inserted or modified rows, it is the same as directly executing the upstream SQL in TiDB (there is no upstream database information, only TiDB-related information).

Do you have any requirements to obtain some schema information from the upstream database after data migration to the downstream database?

| username: tjdagang1 | Original post link

Thank you, teacher. My project does not need to obtain upstream information. I just want to know, according to your explanation 2, how does the DM tool distinguish between the data with id=1 in the test_dm table of upstream database A and the data with id=1 in the test_dm table of upstream database B? If the data with id=1 is deleted from database A, the DM tool only executes the deletion for database A, and the data with id=1 from database B can still be queried in TiDB.

| username: buchuitoudegou | Original post link

FYI: 分表合并数据迁移最佳实践 | PingCAP 文档中心

If the id is not unique globally, it is not recommended to use it as a primary key/unique key. Refer to this document to resolve the issue.

| username: tjdagang1 | Original post link

Thank you, teacher.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.