The database generates completely identical duplicate data every day, either 2 or 3 entries. The SQL logic is to update first, and if the update returns 0, then insert

translator_bot · June 22, 2024, 2:11am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据库每天都产生完全一致的重复数据，2条或者3条都有，sql逻辑是先更新，如果更新为0则插入

| username: 学无止境

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4
[Reproduction Path] Save or update operation
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration]
The database generates completely identical duplicate data every day, with either 2 or 3 duplicates. The SQL logic is to update first, and if the update returns 0, then insert.

translator_bot · June 22, 2024, 2:11am

| username: 我是咖啡哥 | Original post link

Can it be reproduced with SQL? Or take a look at the code logic.

translator_bot · June 22, 2024, 2:11am

| username: Jellybean | Original post link

According to your description, see if using the “insert into on duplicate key update” syntax can meet the business needs.

translator_bot · June 22, 2024, 2:11am

| username: zhanggame1 | Original post link

Is it completely duplicated? Does the table not have a primary key?

translator_bot · June 22, 2024, 2:11am

| username: tidb菜鸟一只 | Original post link

My understanding is that it should be caused by concurrency.

Session 1
SQL1: update test set a=1 where a=0; — returns result 0
SQL2: insert into test(a) values (1);

Session 2
SQL1: update test set a=1 where a=0; — if executed after session 2 SQL2, returns result 1; — if executed before session 2 SQL2, returns result 0
SQL2: insert into test(a) values (1); — if executed after session 2 SQL2, does not execute; if executed before session 2 SQL2, causes duplicate insertion

So as long as SQL1 of session 1 and session 2 are executed in parallel, SQL2 will both execute.

translator_bot · June 22, 2024, 2:11am

| username: zhanggame1 | Original post link

This design is quite strange. Generally, this should be made into a transaction:
begin
select for update
judge
insert or update
end

translator_bot · June 22, 2024, 2:11am

| username: 有猫万事足 | Original post link

You need to note that the default transaction isolation level of TiDB is RR, not RC.

When the transaction isolation level is Repeatable Read, only the data modified by other transactions that have been committed at the start of the transaction can be read. Uncommitted data or data committed by other transactions after the transaction starts is not visible. For this transaction, the transaction statements can see the modifications made by previous statements.