Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 集群短时间内多个tikv节点异常FATAL报错后重启
[TiDB Usage Environment] Production environment, 3 TiDB, 3 PD, 8 TiKV
[TiDB Version] v4.0.11
[Reproduction Path] No special operations during the fault
[Encountered Problem: Symptoms and Impact]
Read/write timeout, slow response
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Screenshot of the FATAL error log from the first TiKV
Out of 8 TiKV, 4 instances encountered this FATAL error.
The error occurred at the following timestamps:
[2023/05/28 18:57:50.251 +08:00]
[2023/05/28 18:57:51.442 +08:00]
[2023/05/28 18:59:46.938 +08:00]
[2023/05/28 19:00:38.893 +08:00]
Try rebuilding the index to see if it resolves the error.
I didn’t see any index errors.
table: fix insert into _tidb_rowid panic and rebase it if needed (#22062) by ti-srebot · Pull Request #22359 · pingcap/tidb · GitHub, isn’t this the bug? Generally, index out of range is triggered by abnormal situations, and the code doesn’t even return the exception correctly…
The page opens to a 404 error
Are you referring to this?
But this is the Release Notes for version 4.0.11, and my current version is 4.0.11.
Do you have any solutions in mind? Or any methods to obtain more error information?
The index here is out of bounds, not an indexing issue.
As for the tidb_rowid error you mentioned, it’s a TiDB issue, not a TiKV issue; so these are two different things.
TiDB has many index out of range issues, so this problem is basically random. If you encounter it frequently, you might consider upgrading.
Currently, I don’t know which version fixed this array out-of-bounds issue. Upgrade to a stable version of 5.0?
How about trying to rebuild the index?