The data retrieved during TiDB concurrency is not the latest

translator_bot · June 22, 2024, 4:53am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb并发的时候拿的数据不是最新的

| username: zhimadi

[TiDB Usage Environment] Production Environment / Testing
[TiDB Version] v5.4.2
[Reproduction Path] What operations were performed to cause the issue
There are two existing businesses, A and B:
Business A starts a transaction with the following SQL statements:

SELECT * from t1 where id=1 FOR UPDATE;
...
UPDATE t1 SET num = num-1 WHERE id=1;

Business B also has a query in the transaction with the following SQL statement:

SELECT * from t1 where id=1 FOR UPDATE;

Execute businesses A and B simultaneously or sequentially.

[Encountered Issue: Problem Phenomenon and Impact]
When business A has not completed, business B will block and wait. After business A completes and the transaction commits, business B will continue to run, but the data retrieved by SELECT * from t1 where id=1 FOR UPDATE in business B is the data before business A’s modification.

Repeated testing shows that after one transaction commits and releases the lock, the other transaction still retrieves the old data. This causes many data issues.

Could you please explain the reason for this and suggest any solutions?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 22, 2024, 4:53am

| username: xiexin | Original post link

This issue is related to MVCC.
Transactions A and B start, and the data version obtained by both transactions is the data version when the first SQL statement is executed.
If transaction B wants to query or modify data based on transaction A, then transaction B should start after transaction A commits.

translator_bot · June 22, 2024, 4:53am

| username: tidb狂热爱好者 | Original post link

It was originally designed this way.

translator_bot · June 22, 2024, 4:53am

| username: zhanggame1 | Original post link

It’s easy to solve. You haven’t changed the transaction level, right? TiDB’s default transaction level is RR. Change it to RC level. In the default RR level, after transaction A starts, if transaction B starts later, the changes made by transaction A will not be visible to transaction B.

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

This does not conform to the situation of phantom reads. If another transaction is started after committing, then the lock would be useless.

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

How should this situation be resolved, expert?

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

Currently using the default settings. Would there be any other impacts if the rc level is used?

translator_bot · June 22, 2024, 4:53am

| username: zhanggame1 | Original post link

It doesn’t have much impact, and the performance of RC is slightly better. RR isolation is not very useful as it cannot access data committed by other transactions. MySQL uses RR due to historical reasons related to binlog, while Oracle uses the RC level.

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

However, MySQL’s repeatable read doesn’t have this issue either.

translator_bot · June 22, 2024, 4:53am

| username: zhanggame1 | Original post link

Just set global transaction_isolation='READ-COMMITTED'; and it will work.

translator_bot · June 22, 2024, 4:53am

| username: zhanggame1 | Original post link

I have studied it, and the two RR levels are different. In TiDB, once transaction A starts, transaction B will not be able to see the changes made by transaction A if it starts afterward. In MySQL, if transaction A makes changes, transaction B can still see the changes made by transaction A even if it starts before transaction A.

translator_bot · June 22, 2024, 4:53am

| username: xiexin | Original post link

I misunderstood, “for update” retrieves the latest current data.
Transaction B should get the data submitted by Transaction A.
This situation is not a phantom read.
I tested it on my side and got the latest data. I’m not sure what version and isolation level of TiDB you are using.
TiDB version: v4.0.8
Isolation level: REPEATABLE-READ

translator_bot · June 22, 2024, 4:53am

| username: xiexin | Original post link

The official documentation provides a detailed description of the monitoring indicators: TiDB 监控框架概述 | PingCAP 文档中心.

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

REPEATABLE-READ
TiDB version: v5.4.2
Hold on, I’ll test it with your script and post a screenshot.

translator_bot · June 22, 2024, 4:53am

| username: redgame | Original post link

It depends on the business requirements. Choose one among RR, optimistic locking, and pessimistic locking.

translator_bot · June 22, 2024, 4:53am

| username: zhanggame1 | Original post link

Have the test results come out? I tested version 7.1, and with “for update” I got the latest value, but without “for update” it was the old value.

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

How should one generally choose when there are requirements for sales, inventory, and financial transactions?

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

TiDB version: v5.4.2
Isolation level: REPEATABLE-READ
Transaction B will be blocked until transaction A commits and retrieves the committed data.

translator_bot · June 22, 2024, 4:53am

| username: zhimadi | Original post link

May I ask, if transaction A has tables t1 and t2, and transaction B also has tables t1 and t2. In the previous scenario, if transaction A adds a “for update” lock on table t1 and updates tables t1 and t2, will transaction B see the data in table t2 before or after the changes?

translator_bot · June 22, 2024, 4:53am

| username: xiexin | Original post link

That is expected. “For update” is a current read, so it will get the latest data. Your previous question was that after transaction 1 updates, transaction 2’s “for update” query gets the old data. This issue does not exist.