TiFlash error log: region does not exist

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiFlash error log :region does not exist.

| username: wzf0072

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.2
[Reproduction Path] Operations performed that led to the issue
tidb_gc_life_time set to 1 hour
max_execution_time set to 30 minutes
[Encountered Issue: Problem Phenomenon and Impact]
TiDB Server SQL execution lag, normally executing SQL under 1 second, during the fault period it takes over 1 minute without results. After killing all SQL, Tiflash Request Duration and Request Handle Duration show no signs of decreasing; Tiflash error log indicates: region 202959 does not exist.

[Resource Configuration] Navigate to TiDB Dashboard - Cluster Info - Hosts and screenshot this page


[Attachments: Screenshots/Logs/Monitoring]
[2023/06/01 20:29:58.163 +08:00] [WARN] [RegionTable.cpp:263] [“region 202959 does not exist.”] [thread_id=72]
[2023/06/01 20:30:00.164 +08:00] [WARN] [RegionTable.cpp:263] [“region 202986 does not exist.”] [thread_id=72]

| username: tidb狂热爱好者 | Original post link

Try taking TiFlash offline and then bringing it back online.
Secondly, test the disk speed.
Consider the possibility that the hardware is not up to standard and the I/O read/write speed is too poor. If you are using Alibaba Cloud, 99% of the time this is the issue.

| username: wzf0072 | Original post link

The server used is a hyper-converged server with all-flash storage, and the maximum IO latency observed is 3ms. After setting the tiflash replica of all tables to 0, the latency still persisted for about 20 minutes.

| username: tidb狂热爱好者 | Original post link

I thought about your operation. Originally, gc=10 minutes was fine, but changing it to 1 hour will drastically deteriorate database performance. There will be a lot of useless keys to scan, which is why the average SQL takes 8 minutes.

You need to change gc back to 10 minutes immediately, or it will get slower and slower.

| username: wzf0072 | Original post link

gc changed to 10 minutes, and “read tso: 441829476756619266 is smaller than tidb gc safe point” frequently appeared;
On the 31st, there were 4 performance failures, and after changing GC to 1 hour yesterday, only 1 occurred.

| username: zhanggame1 | Original post link

In some cases, disk performance can significantly decrease after virtualization. Check if it’s possible to enable direct disk access for the virtual machine.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.