Is it possible for a write transaction with a smaller timestamp than read_ts to commit successfully when using read_ts for reading?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用read_ts读取,存在比read_ts更小的写事务提交成功吗

| username: TiDBer_ivNHRxaK

When using read_ts to read, is it possible for a write transaction with a smaller timestamp than read_ts to be successfully committed?

| username: FutureDB | Original post link

Do you have any specific scenarios or cases you have encountered?

| username: Jellybean | Original post link

Are you describing an issue in the Stale Read scenario?
If so, Stale Read is a mechanism for reading historical data versions, where it reads committed historical data versions and does not read the latest committed data.

| username: dba远航 | Original post link

Didn’t understand.

| username: 哈喽沃德 | Original post link

If the commit_ts of a write transaction is smaller than the read_ts of the current node, it will not be read. However, there may be situations in the cluster where the timestamps of different nodes are not completely synchronized, which could result in some transactions having a commit_ts smaller than the read_ts of certain nodes.

| username: 江湖故人 | Original post link

Where did you see read_ts? I couldn’t find it in the documentation. The timestamp when reading must be greater than the timestamp when committing, otherwise, it would be a dirty read.

| username: 小龙虾爱大龙虾 | Original post link

Stale Read is a mechanism for reading historical data versions, allowing you to read historical data stored in TiDB. With the Stale Read feature, you can read the corresponding historical data from a specified point in time or time range, thereby avoiding delays caused by data synchronization. When using Stale Read, TiDB will randomly select a replica to read data by default, thus utilizing all replicas.
Link: Search | PingCAP 文档中心

When using Stale Read to read data, TiDB will check the Read ts to ensure it is greater than the Region’s Safe ts. On the Leader, the safe ts advances with the Resolved TS, while the Safe ts on the Follower is synchronized from the leader via CheckLeader RPC. This way, when the Follower applies to the Leader’s index, it can advance the Safe ts to the Leader’s Safe ts. Therefore, Read ts > Safe ts > Resolved ts > subsequent Commit ts.

| username: Kongdom | Original post link

:thinking: Aren’t timestamps all given by the leader PD? There shouldn’t be a situation where different nodes are not completely consistent, right?

| username: TiDBer_ivNHRxaK | Original post link

Not sure if it’s called Stale Read.
For example, there’s a key(commit_ts=10) = v10. Now, a new write transaction gets commit_ts=20, and another concurrent read transaction gets ts=30. When using ts=30 to read the key(commit_ts=10), the commit_ts=20 transaction starts executing and generates key(commit_ts=20) = v20. Logically, the read transaction with ts=30 should read v20.

| username: TiDBer_ivNHRxaK | Original post link

Two concurrent transactions, one is a write transaction and the other is a read transaction, with the read transaction having a larger timestamp. Due to concurrent execution, the read transaction may execute before the write transaction starts. The read transaction reads the version before this write transaction, but logically, the read transaction should read the version after this write transaction is completed.

| username: Jellybean | Original post link

  1. Whether it is a read transaction or a write transaction, the PD Leader uniformly assigns strictly increasing transaction IDs, including the transaction start time and transaction end time. According to your description, this is what you refer to as ts. Different concurrent access requests will have a sequential order for the transaction start times.

  2. The snapshot isolation level will always read the results of transactions that were completed and committed before the start of the current transaction. Uncommitted transactions will not be accessed. Therefore, as long as the write transaction with a start time of 20 has not been committed before 30, the read transaction with a start time of 30 will not access its data. It will only read up to v10 and complete the read transaction process.

I suggest you review the official documentation related to transaction descriptions.

| username: Kongdom | Original post link

You mean there’s no need to introduce concurrency, right? The read transaction reads 0, and the write transaction writes 1?
Field A has a value of 0. At 00:01, the write transaction starts. At 00:02, the read transaction starts. At 00:03, the read transaction ends. At 00:04, the write transaction changes the value of field A to 1. At 00:05, the write transaction ends.

If transactions are added, my understanding is that the read transaction at 00:03 will not end until the write transaction ends at 00:05, and it will read 1.

| username: 小龙虾爱大龙虾 | Original post link

So you weren’t talking about Stale Read :rofl:

| username: TiDBer_ivNHRxaK | Original post link

Yes. A snapshot read with ts=30 will update the MaxTs of the TiKV node, and the ts of new committed transactions must be greater than or equal to MaxTs+1. Therefore, a write transaction with ts=20 will be prevented from committing.

| username: Jellybean | Original post link

Writes do not block reads, and reads do not block writes.

Only writes will conflict with each other.