Multiple TiKV Nodes Encounter FATAL Errors and Restart Within a Short Period in the Cluster

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 集群短时间内多个tikv节点异常FATAL报错后重启

| username: Zhang_Zhi

[TiDB Usage Environment] Production environment, 3 TiDB, 3 PD, 8 TiKV
[TiDB Version] v4.0.11
[Reproduction Path] No special operations during the fault
[Encountered Problem: Symptoms and Impact]
Read/write timeout, slow response
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachments: Screenshots/Logs/Monitoring]
Screenshot of the FATAL error log from the first TiKV

Out of 8 TiKV, 4 instances encountered this FATAL error.

The error occurred at the following timestamps:
[2023/05/28 18:57:50.251 +08:00]
[2023/05/28 18:57:51.442 +08:00]
[2023/05/28 18:59:46.938 +08:00]
[2023/05/28 19:00:38.893 +08:00]

| username: zhanggame1 | Original post link

Try rebuilding the index to see if it resolves the error.

| username: Zhang_Zhi | Original post link

I didn’t see any index errors.

| username: tidb菜鸟一只 | Original post link

table: fix insert into _tidb_rowid panic and rebase it if needed (#22062) by ti-srebot · Pull Request #22359 · pingcap/tidb · GitHub, isn’t this the bug? Generally, index out of range is triggered by abnormal situations, and the code doesn’t even return the exception correctly…

| username: Zhang_Zhi | Original post link

The page opens to a 404 error


Are you referring to this?
But this is the Release Notes for version 4.0.11, and my current version is 4.0.11.

| username: Zhang_Zhi | Original post link

Do you have any solutions in mind? Or any methods to obtain more error information?

| username: huhaifeng | Original post link

The index here is out of bounds, not an indexing issue.

As for the tidb_rowid error you mentioned, it’s a TiDB issue, not a TiKV issue; so these are two different things.

TiDB has many index out of range issues, so this problem is basically random. If you encounter it frequently, you might consider upgrading.

| username: Zhang_Zhi | Original post link

Currently, I don’t know which version fixed this array out-of-bounds issue. Upgrade to a stable version of 5.0?

| username: h5n1 | Original post link

TiKV running over 2 years may panic · Issue #11940 · tikv/tikv · GitHub It looks like this bug.

| username: redgame | Original post link

How about trying to rebuild the index?