Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: region操作丢失

[TiDB Usage Environment] Test
[TiDB Version]
Cluster version V4.0.7–4.0.8
[Reproduction Path]
TiKV node crashes and goes offline. Without direct manual intervention, TiKV has not successfully gone offline for half a month. The test environment business validation SQL prompts “region is unavailable”. Logging into TiDB is sluggish, and operations like “show processlist” are very time-consuming. Some commands prompt error 9005. Through monitoring, it was observed that:
1. The load is not high.
2. There are storage alarms.
3. There are frequent SQL operations through the dashboard (queries also fail).
4. An index is being added.
5. Statistics are frequently updated.
At this time, executing “admin show” also directly reports the error “region is unavailable”.
[Process Handling-1]
1. Manual intervention to evict.
2. Expand TiKV.
3. Adjust the statistics update time.
4. Reduce DDL concurrency.
[Process Handling-2]
1. Through the above operations, the cluster gradually became available. Executing “admin show ddl” can produce normal results, and DDL operations were canceled.
2. Upgraded to 4.0.8.
[Process Handling-3]
The cluster recovered and became usable, slowly waiting for the crashed node to become a tombstone.
[Process Handling-4]
On March 27, monitoring PD:
It still hasn’t successfully gone offline. Querying the abnormal store’s region situation, as shown in the figure:
Querying the region number, displaying region information:
Based on previous handling experience, the cache deletion operation was executed:
Logging into pd-ctl to check if the region has recovered:

It always shows null. Restarting 2 peer nodes still shows the same. Node logs display:
However, the monitoring store status now shows recovery:
, and the node also shows “state_name”: “Tombstone”.
[Question]
The deleted region information cannot be restored » region 102207
null, the information of the 2 nodes is as shown above. Is it safe to perform unsafe operations?