Troubleshooting Data Loss When Using DataX to Synchronize MySQL to TiDB

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用DataX同步mysql到tidb丢失数据排查

username: TiDB_C罗

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Encountered Problem: Problem Phenomenon and Impact]
Background: The business uses DataX to synchronize MySQL to TiDB and found data loss.
The statement generated by DataX is insert…on duplicate key update. The TiDB table has a primary key and a unique index. It is initially suspected that the unique index is causing the issue. Now, a shadow table is created, and DataX is used to insert into both tables simultaneously to verify.
Currently seeking troubleshooting ideas to see if there are any other reasons that could cause similar situations?

username: Jellybean

What version of TiDB are you using? How many unique keys are there? Have you used any method other than DataX for synchronization verification?

In our other scenarios of data synchronization (using DM, Flink, or Syncer, etc.), we haven’t seen the situation you mentioned.

username: TiDB_C罗

The image you uploaded is not visible. Please provide the text you need translated.

username: tidb菜鸟一只

Check the DataX logs. DataX will indicate how much data has been read and how much data has been synchronized.

username: TiDB_C罗

The verification results are out. It is indeed caused by the inconsistency of the unique index between upstream and downstream. DataX synchronized to TiDB, violating the unique constraint rule and causing an overwrite.

username: TiDB_C罗

There are no anomalies in the logs; they all show normal return times and results.

username: Kongdom

:sweat_smile: The table structures are different. But that aside, is it possible to configure the action of directly overwriting duplicate primary keys?

username: TiDB_C罗

Unless you don’t use “on duplicate key update,” let it throw an error.

username: Kongdom

Oh, you used this, no wonder. We are planning to replace the ETL tool Kettle and use DataX instead.

username: redgame

It’s better to make it consistent.

username: system

