The database generates completely identical duplicate data every day, either 2 or 3 entries. The SQL logic is to update first, and if the update returns 0, then insert

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据库每天都产生完全一致的重复数据,2条或者3条都有,sql逻辑是先更新,如果更新为0则插入

| username: 学无止境

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4
[Reproduction Path] Save or update operation
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration]
The database generates completely identical duplicate data every day, with either 2 or 3 duplicates. The SQL logic is to update first, and if the update returns 0, then insert.

| username: 我是咖啡哥 | Original post link

Can it be reproduced with SQL? Or take a look at the code logic.

| username: Jellybean | Original post link

According to your description, see if using the “insert into on duplicate key update” syntax can meet the business needs.

| username: zhanggame1 | Original post link

Is it completely duplicated? Does the table not have a primary key?

| username: tidb菜鸟一只 | Original post link

My understanding is that it should be caused by concurrency.

Session 1
SQL1: update test set a=1 where a=0; — returns result 0
SQL2: insert into test(a) values (1);

Session 2
SQL1: update test set a=1 where a=0; — if executed after session 2 SQL2, returns result 1; — if executed before session 2 SQL2, returns result 0
SQL2: insert into test(a) values (1); — if executed after session 2 SQL2, does not execute; if executed before session 2 SQL2, causes duplicate insertion

So as long as SQL1 of session 1 and session 2 are executed in parallel, SQL2 will both execute.

| username: zhanggame1 | Original post link

This design is quite strange. Generally, this should be made into a transaction:
begin
select for update
judge
insert or update
end

| username: 有猫万事足 | Original post link

You need to note that the default transaction isolation level of TiDB is RR, not RC.

When the transaction isolation level is Repeatable Read, only the data modified by other transactions that have been committed at the start of the transaction can be read. Uncommitted data or data committed by other transactions after the transaction starts is not visible. For this transaction, the transaction statements can see the modifications made by previous statements.

| username: cassblanca | Original post link

Strongly recommend optimizing the business scenario and then adjusting the default transaction isolation level of TiDB according to the requirements.

| username: zhanggame1 | Original post link

It can be dynamically adjusted, just set it at the session level.

| username: redgame | Original post link

Sure, although I didn’t quite understand what the question is…

| username: 学无止境 | Original post link

This requires a unique index, but with a unique index, duplicate data will not be inserted.

| username: 学无止境 | Original post link

Caused by concurrent requests for the same data.

| username: 学无止境 | Original post link

Duplicates other than the primary key

| username: 学无止境 | Original post link

It was caused by concurrency, but I didn’t understand your solution.

| username: 学无止境 | Original post link

What level should it be adjusted to? Caused by concurrency.

| username: zhanggame1 | Original post link

You can use “replace into” to update first and then insert.

| username: 学无止境 | Original post link

This requires a unique index or primary key, but the dimension being updated is not.

| username: tidb菜鸟一只 | Original post link

Try using REPLACE INTO with a primary key, don’t use the original method anymore.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.