Startup Failure Caused by raft-engine Deletion

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: raft-engine删除导致启动失败

| username: Chengf01

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.1
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]


Three TiKV nodes, one of which had its data in the raft-engine directory deleted. The 0000000000000001.rewrite file from node 2 was copied over.

Now, each TiKV log reports the following errors:

Is there any solution?

| username: TiDBer_jYQINSnf | Original post link

If it’s confirmed that only one is broken, then delete and rebuild it, since there are 3 replicas. But if you say all of them are reporting this error, then it’s tricky. The raft-engine mainly stores raft logs, most of the data is in the db, theoretically, it can still be recovered through unsaferecover. If the data is not important, then don’t bother with it.

| username: Chengf01 | Original post link

Node 3 was damaged and directly tombstoned; the data is quite important.

| username: Chengf01 | Original post link

In the end, only node 2 was left, but it couldn’t hold up and crashed. Now, none of the three nodes can start up.

| username: 像风一样的男子 | Original post link

You have a serious malfunction. I suggest you contact the original manufacturer’s technical support for repair.

| username: dba远航 | Original post link

The copied content is different, such as the location of information within the REGION, etc.

| username: Fly-bird | Original post link

Check if you can take the original faulty node offline, and see if the service can still come up.

| username: 小龙虾爱大龙虾 | Original post link

If it’s not important, don’t reply :rofl:

| username: 江湖故人 | Original post link

Yes, tinkering with it yourself might make it even more unmanageable.

| username: 春风十里 | Original post link

If there is enough disk space, first shut down and make a cold backup to preserve the current state, then research a solution.

| username: Jellybean | Original post link

Does the deletion refer to using a command like “rm -rf” to forcibly erase the data?

| username: 普罗米修斯 | Original post link

The data saved in Raft within TiKV is different and cannot be used by others. Remove the copied data and see if the remaining two TiKV nodes can start. If they can, try expanding the nodes.