After the machine rebooted, two KV nodes on one of the machines failed to start. I checked the logs but couldn't understand what they meant. Help needed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 机器重启后,其中一台机上面的两个KV节点就拉不起来了,看了下日志,没太懂是什么意思,求救

| username: LBX流鼻血

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1
After the machine restarted, two KV nodes on one of the machines couldn’t start. I checked the logs but didn’t understand what they meant. Help needed.

| username: xfworld | Original post link

How did you restart it?

I suggest quickly finding other resources, creating two new TiKV instance nodes, and connecting them…
First, ensure the cluster’s data is preserved…

The faulty node can be taken offline and dealt with later.

| username: tidb菜鸟一只 | Original post link

What’s the cluster topology like? How many TiKV nodes are there in total, and how many replicas?

| username: LBX流鼻血 | Original post link

Cluster topology
The two nodes that couldn’t start have already been taken offline, but they are still pending.

| username: LBX流鼻血 | Original post link

@tidb newbie Please take a look, expert.

| username: h5n1 | Original post link

This is about scaling down, right? Use pd-ctl delete or tiup cluster scale-in, and wait for the region migration to complete. You can check the number of regions using information-schema.tikv_store_status or pd-ctl store.

| username: TiDBer_嘎嘣脆 | Original post link

You need to check the number of replicas, which is 3 by default. If the scaling down was executed in this state, then wait for the data migration to complete and observe further.

| username: h5n1 | Original post link

TiKV won’t start. It should be this bug: raft engine panic during recovery · Issue #13123 · tikv/tikv · GitHub overwrite panic during recovery · Issue #250 · tikv/raft-engine · GitHub
Fixed in version 6.1.1: Fixed an issue where TiKV might panic when enabling Raft Engine concurrent recovery.

| username: tidb菜鸟一只 | Original post link

You are already in the process of scaling down, so just wait for it to complete before scaling up again.

| username: cy6301567 | Original post link

It is best to follow the official recommendations for resource allocation and deployment, and avoid mixing everything on a single machine.

| username: zhanggame1 | Original post link

Mixed deployment is possible, just label it properly.

| username: WalterWj | Original post link

How did you mount your disk? Check by running cat /etc/fstab.

| username: Billmay表妹 | Original post link

Expand first, then shrink~

| username: redgame | Original post link

According to the official recommendations, allocate resources for deployment, expand capacity, and protect data.