Can someone help take a look! When DM synchronizes a table without a primary key, why does the synchronized data become duplicated over time?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 有没有牛爷爷帮忙看看!!!DM同步一张无主键的表,为什么时间一长,同步的数据就有一些重复了

| username: TiDBer_STGGd1J1

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Table Structure
img_v3_02ak_509abfbb-a789-42d1-a460-317230da959g
Data duplicates after a long time
Error:


Correct:

And it is a scheduled script
The first script is delete from pts_ir128668
The second script is the insert statement

| username: TiDBer_STGGd1J1 | Original post link

Task configuration template

| username: TiDBer_STGGd1J1 | Original post link

Will this cause repeated reading and writing of a certain segment of the binlog?

| username: WalterWj | Original post link

DM task might restart, or a switch might occur. There will be a period of safemode mode, during which some data will be replayed multiple times for safety.

| username: TiDBer_STGGd1J1 | Original post link

A switch occurred, specifically what does it mean? Could master-slave switching also cause it? Or restarting the worker? Or could resuming the task also cause it?

| username: TIDB-Learner | Original post link

No primary key, and if there is no unique index, having duplicate data is very normal :smiley: :smiley:

| username: yytest | Original post link

A table without a primary key should be a non-clustered table, right? The system should automatically generate a unique ID.

| username: 小龙虾爱大龙虾 | Original post link

DM requires tables to have primary keys or valid indexes for data synchronization. Refer to: