Is there a complete recovery procedure after a single TiKV instance failure?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV单个实例故障后有完整的恢复操作流程么

| username: residentevil

【TiDB Usage Environment】Production Environment
【TiDB Version】V.6.1.7
【Encountered Problem: Problem Phenomenon and Impact】 Four TiKV instances are deployed on one server. If the entire machine fails, it means that one storage replica is missing from each of the four TiKV stores on this machine [from 3 to 2]. Is there a specific recovery process to restore this replica afterward? Including removing the faulty replica from PD first, and then adding a new TiKV replica.

| username: 像风一样的男子 | Original post link

Refer to

| username: residentevil | Original post link

Let me take a look.

| username: residentevil | Original post link

We also used the following configuration for TIKV:
config:
server.labels:
zone: z2
host: 10.1.1.1
Restoring this might be a bit troublesome, right?

| username: 像风一样的男子 | Original post link

It’s the same. You configure the labels, and when you expand the nodes, just apply the labels again.

| username: residentevil | Original post link

Looking at your recovery process, it is quite lengthy overall. I see that the TiDB version is V5.4.3. Is it easier to remove old nodes and add new nodes in the newer versions?

| username: 像风一样的男子 | Original post link

This involves reinstalling the KV node on the original server. If you have a new machine, it’s simpler; you can directly scale out with the new node.

| username: xingzhenxiang | Original post link

Use tiup force scale-in for this node, then redeploy by scaling out.

| username: residentevil | Original post link

Indeed :+1:

| username: residentevil | Original post link

Let me ask you a question. Can the following monitor [TIKV instance recovery speed] get status from the METRIC interface?

| username: 像风一样的男子 | Original post link

Generally, we directly check the monitoring to see if the regions are balanced. I haven’t had any experience with the METRIC interface, so I’m not sure about it.

| username: residentevil | Original post link

Okay, got it.