Aliyun Production Database Encounters RocksDB Read Error

【TiDB Usage Environment】Production Environment
【TiDB Version】tidb v3.0.8
【Encountered Issue】One of the nodes encountered

[2022/07/14 10:31:13.553 +08:00] [WARN] [] [error-response] [err=“[src/storage/kv/]: RocksDb Corruption: block checksum mismatch: expected 2341394949, got 2266477260 in /data/tidb/deploy/data/db/11559286.sst offset 1232057 size 29030”]

causing a batch of SQL issues. Other nodes are functioning normally, but several SST files on this node have this problem. It is difficult to determine whether it is a disk bad sector or a TiDB database issue, and other files on this node are normal, and the node itself is also normal.

  1. The underlying cloud storage of cloud hosts is quite complex and is also distributed. Since it is distributed, there are consistency issues.
  2. Check the logs of the corresponding TiKV to see if there are any anomalies or panic occurrences.
  3. Check the OS logs to see if there are any hardware warnings, such as dmesg or kernel logs.
  4. It might be necessary to replicate the data by scaling in and out.
  1. I have checked the TiKV logs, and there are no related errors.
  2. I have checked dmesg, and there are no issues. The last startup was more than 2 years ago.
  3. Unable to confirm.

Although the node has now recovered without any operations, I am quite worried about encountering this issue again next time.

That is most likely the case of 1, but you have no evidence.

We are using Alibaba Cloud’s RDS database, which is said to use distributed Ceph storage at the underlying level. It seems that the underlying Ceph storage has reported an error.

The core technologies of these leading cloud providers are networking and storage, which are the core competitiveness of their cloud products. They rarely use open source solutions and do not disclose their technologies to the public.

I have seen on-site operations personnel use commands like ceph -s, so it should be custom development based on this.

