Troubleshooting Data Loss When Using DataX to Synchronize MySQL to TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用DataX同步mysql到tidb丢失数据排查

| username: TiDB_C罗

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Encountered Problem: Problem Phenomenon and Impact]
Background: The business uses DataX to synchronize MySQL to TiDB and found data loss.
The statement generated by DataX is insert…on duplicate key update. The TiDB table has a primary key and a unique index. It is initially suspected that the unique index is causing the issue. Now, a shadow table is created, and DataX is used to insert into both tables simultaneously to verify.
Currently seeking troubleshooting ideas to see if there are any other reasons that could cause similar situations?

| username: Jellybean | Original post link

What version of TiDB are you using? How many unique keys are there? Have you used any method other than DataX for synchronization verification?

In our other scenarios of data synchronization (using DM, Flink, or Syncer, etc.), we haven’t seen the situation you mentioned.

| username: TiDB_C罗 | Original post link

The image you uploaded is not visible. Please provide the text you need translated.

| username: tidb菜鸟一只 | Original post link

Check the DataX logs. DataX will indicate how much data has been read and how much data has been synchronized.

| username: TiDB_C罗 | Original post link

The verification results are out. It is indeed caused by the inconsistency of the unique index between upstream and downstream. DataX synchronized to TiDB, violating the unique constraint rule and causing an overwrite.

| username: TiDB_C罗 | Original post link

There are no anomalies in the logs; they all show normal return times and results.

| username: Kongdom | Original post link

:sweat_smile: The table structures are different. But that aside, is it possible to configure the action of directly overwriting duplicate primary keys?

| username: TiDB_C罗 | Original post link

Unless you don’t use “on duplicate key update,” let it throw an error.

| username: Kongdom | Original post link

Oh, you used this, no wonder. We are planning to replace the ETL tool Kettle and use DataX instead.

| username: redgame | Original post link

It’s better to make it consistent.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.