TiKV Stop Offline Non-Tombstone State Shutdown Causes Crash

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv stop offline 非墓碑状态停机导致崩溃

| username: tugcgithub2021

The online system crashed. During the scale-down, a TiKV node stopped and went into an offline state instead of a tombstone state, and then it was shut down incorrectly, causing the entire cluster to collapse, making it unable to start or connect for read/write operations. It seems that the PD node still has connection management for this KV node.

Is there any way to recover the data now? Can we extract the data directly from the storage path of the KV node? Or, assuming the remaining KV nodes in the cluster have complete data, how can we repair and restart the cluster?

【TiDB Usage Environment】Production/Test Environment/POC
【TiDB Version】
【Encountered Problem】
【Reproduction Path】What operations were performed that led to the problem
【Problem Phenomenon and Impact】

| username: h5n1 | Original post link

Use tiup cluster display to check the cluster status. Are there multiple TiKV instances on the machine? Your situation seems to be caused by having multiple TiKV instances. In this case, you can only perform multi-replica failure recovery.

Refer to the following: