Will there be cases where data cannot be read immediately after a successful write in a TiDB cluster?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB集群内写成功后立刻读是否会存在读不到数据的情况

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.7
[Encountered Problem: Phenomenon and Impact] Traditional MySQL master-slave replication relies on binlog, so when the write load on the master instance increases, parsing the binlog on the slave becomes a bottleneck. This can lead to a situation where after a write operation is completed and successful on the master, an immediate read operation on the slave may not find the data. However, in TiDB’s architecture, the upper-layer TiDB server is stateless. A successful write means it has been written to TiKV’s RocksDB WAL log, persisted to disk, and conforms to the Raft mechanism. So, does this really mean that within a TiDB cluster, there will never be a situation where a successful write is immediately followed by a read that cannot find the data?

| username: xfworld | Original post link

Don’t worry, it won’t happen. Linear consistency ensures that once data is written, it can definitely be read, and it will be a consistent read.

| username: Fly-bird | Original post link

No, it’s a row lock when writing, and it’s gone after writing.

| username: tidb菜鸟一只 | Original post link

As long as you commit, it can definitely be found…

| username: 大飞哥online | Original post link

Rest assured.

| username: 舞动梦灵 | Original post link

After writing the data, it is written to TiKV RocksDB, then the log is written and persisted. These operations are performed by the leader, and only when the write is successful will a success result be returned to the client. When you immediately query after a successful write, it will definitely query the latest version of the data from the leader’s MVCC. Moreover, the internal server network requirement for the TiDB architecture is within 2ms, meaning that after completion, the data will be transmitted to the replica within 2ms. Normally, my company does not use a master-slave structure for TiDB. The TiDB architecture itself can handle the failure of any single server without issues, so there is no need to set up an additional replica server.

| username: residentevil | Original post link

It seems that TiDB supports strong consistency reads.

| username: zhanggame1 | Original post link

TiDB is designed to be strongly consistent, so it won’t fail to read.

| username: residentevil | Original post link

I’ll simulate it offline on my end, thank you.

| username: 随缘天空 | Original post link

TiDB is based on the Raft protocol. Under normal circumstances, the scheduling and data replication mechanisms of the PD component can ensure data security and consistency. However, data writing and reading are handled by the Leader node. If a failure or transfer occurs after the Leader node completes the write, the newly elected Leader node may not have the data that was just written, which could result in a situation where the data cannot be read. However, this probability is very small, so there is no need to worry.

| username: Kongdom | Original post link

:thinking: TiDB’s read and write operations are consistent. According to your example, it is [Write: Primary], [Read: Primary].
It also supports [Read: Replica], but it needs to be set manually.
Finally, please note: In TiDB, primary and replica refer to replicas, not nodes. This is completely different from MySQL.

| username: TiDBer_小阿飞 | Original post link

Strong consistency

| username: TiDBer_小阿飞 | Original post link

TiDB is strongly consistent.

| username: zhanggame1 | Original post link

“Reading from the replica won’t cause inconsistency, okay… Reading from the replica also needs to confirm consistency with the primary, rather than just giving the result directly.”

| username: Kongdom | Original post link

You can refer to the official documentation for consistency guarantees.

Region and Raft Protocol

Regions and replicas maintain data consistency through the Raft protocol. Any write request can only be written on the Leader and needs to be written to the majority of replicas (the default configuration is 3 replicas, meaning all requests must be successfully written to at least two replicas) before returning a successful write to the client.

| username: Kongdom | Original post link

Yes, you are right.

| username: cassblanca | Original post link

Consistent reads and basic operations are beyond doubt. If the Raft consensus algorithm cannot guarantee consistency, then Google should retract its paper from that year, and all infrastructure based on it would collapse.

| username: 有猫万事足 | Original post link

The only thing to note is that TiDB’s default transaction isolation level is RR, not RC.

When the transaction isolation level is repeatable read, only the data modified by other transactions that have been committed at the time the transaction starts can be read. Uncommitted data or data committed by other transactions after the transaction starts are not visible. For this transaction, the transaction statements can see the modifications made by previous statements.

begin t1;
update t;
begin t2
commit t1;
select from t – At the RR transaction isolation level, the update t cannot be seen at this time because t1 was not committed when t2 began.
commit t2
select from t – Only at this point can the update t be seen.

| username: Soysauce520 | Original post link

Impossible. One of the signs of a successful write is that most replicas have the data.

| username: Kongdom | Original post link

:+1: :+1: :+1: The theoretical knowledge is so solid~