Data Loss After REPLACE INTO

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: replace into 后丢失数据

| username: TiDBer_Bo0lt2rY

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.2
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Phenomenon and Impact]

  1. Phenomenon One: Using Golang programming, multiple REPLACE INTO statements are executed within a transaction, resulting in data loss. The SQL statements are as follows:
REPLACE INTO ri_cg_report(`tenant`, `reportid`, `storeid`, `pla
REPLACE INTO ri_cg_report_info(`tenant`, `reportid`, `templatei
REPLACE INTO ri_cg_report_item(`tenant`, `itemid`, `reportid`, 
REPLACE INTO ri_cg_report_item(`tenant`, `itemid`, `reportid`, 
.... // Several lines omitted here
REPLACE INTO ri_cg_report_summary_item_relation(`tenant`, `repo
REPLACE INTO ri_item_disqualified(`tenant`, `disqualified_id`, 
REPLACE INTO ri_item_disqualified(`tenant`, `disqualified_id`, 

After executing the above transaction, 35 records should be inserted into 4 tables, but it was found that the first and second tables had no inserts, the third table had some missing data, and the fourth table was complete. The transaction did not report any errors!

  1. Phenomenon Two: Executing REPLACE INTO table1 SELECT * FROM table2 in the code returns a modified record count that does not match the actual count. Upon investigation, the inserted data does not match the actual data, but no errors were reported.

The above two issues: Phenomenon Two occurred a few times in version 6, but after upgrading to version 7, it hasn’t happened in the past two months. Phenomenon One was just discovered, and it’s uncertain if it occurred previously.

It’s unclear whether the issue is with Golang or TiDB. The critical point is that it doesn’t report any errors!!! It’s very frustrating.
Has anyone encountered similar issues? Could you provide some troubleshooting suggestions?

[Resource Configuration] Navigate to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachments: Screenshots/Logs/Monitoring]

| username: 江湖故人 | Original post link

Directly run the SQL multiple times to see if it can be reproduced.

| username: 随缘天空 | Original post link

You should execute the specific SQL in the database client to see if any data is lost, and then check whether it is an SQL issue or a problem with your code.

| username: 江湖故人 | Original post link

Could the discrepancy between the modified and inserted content be due to changes in the primary key?

| username: Miracle | Original post link

You can temporarily enable the general log on the database side, then run the program. By checking the logs in the general log, you can confirm whether these SQL statements were executed on the database side or not.

| username: zhanggame1 | Original post link

Does the table you inserted have both a primary key and a unique index? Having just one primary key is fine, but adding a unique index is questionable.

| username: 小龙虾爱大龙虾 | Original post link

Replacing the latter with the former will cause conflicts and overwrite, so it’s possible that the number of rows won’t match, but it won’t throw an error. You can write out the SQL and manually execute it for testing. Then, you can create a table and remove the unique index to find duplicate data.

| username: zhaokede | Original post link

Log the statements that are about to be executed, and compare them to see what data is missing.

| username: zhaokede | Original post link

The REPLACE statement in a table with a primary key or unique constraint will replace the existing row with a new one. In a table without a primary key or unique constraint, it functions the same way as the INSERT statement. You might also want to check the data to see if this is the cause.

| username: 路在何chu | Original post link

Have you identified which data was lost? It would be best if you could reproduce it with an example.

| username: TiDBer_Bo0lt2rY | Original post link

I have a data synchronization program that synchronizes from MySQL binlog to TiDB, and these SQL statements are running continuously.

| username: TiDBer_Bo0lt2rY | Original post link

There is no problem with the SQL itself; it has been running continuously in the production environment for some time. Today, while checking the data, I accidentally found an issue with one piece of data. I retrieved the SQL from the logs and reran it, but didn’t find any problems.

| username: TiDBer_Bo0lt2rY | Original post link

No. Run it again, it’s normal.

| username: TiDBer_9hpPRMwf | Original post link

Phenomenon 2 might be related to the previous bug where in-place updates did not lock.

  1. In-place update value locking issue:
    Keys with duplicate values in INSERT IGNORE and REPLACE statements. In versions prior to v6.1.6, these keys were not locked. This issue has been fixed in #42121. Unique key lock not consistent when row is changed or not · Issue #36438 · pingcap/tidb · GitHub
    Unique index keys with unchanged values in UPDATE statements. In versions prior to v6.5.2, these keys were not locked. This issue has been fixed in #36438. REPLACE statement has unexpected behavior in concurrent transactions · Issue #42121 · pingcap/tidb · GitHub
| username: TiDBer_Bo0lt2rY | Original post link

Thank you for your suggestion! I have added logs before executing each SQL statement in the program to see if I can reproduce this issue. I just tried to enable the general log by setting tidb_general_log=‘on’ on each node, but it didn’t take effect (log level issue?). I’ll try again later.

| username: TiDBer_Bo0lt2rY | Original post link

What you said should be correct. After I upgraded to 7.1, I haven’t noticed this phenomenon in the past two months.

| username: TiDBer_Bo0lt2rY | Original post link

Just confirmed that the tables involved in this transaction only have primary keys and no other unique indexes.

| username: TiDBer_Bo0lt2rY | Original post link

Executing the SQL extracted from the logs works fine. Now I have changed “REPLACE INTO” to “INSERT INTO”. This way, if there is a data issue, the transaction will fail. We need to observe for a while longer.

| username: TiDBer_Bo0lt2rY | Original post link

The transaction contains 40 “replace into” statements, which are written into 5 tables. After comparison, 10 pieces of data were lost, and they are exactly the first 10 in order.

| username: TiDBer_Bo0lt2rY | Original post link

There is an investigation. Reproducing it… is quite difficult. In the production environment, with over 1 million transactions daily, we don’t dare to make random changes. I’ve adjusted the logs, and I’ll observe for a while.