How to Perform an Emergency Cluster Repair

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何紧急修复集群

| username: TiDBer_Lee

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v6.5.3
There is a simple TiDB cluster with 3 machines, each machine has one tidb-server, one pd, and one tikv. Currently, the issue is that two machines are down and the disks are full. After replacing the two machines with new ones and mounting the disks, how can I repair this cluster?

I tried to repair it using pd-recovery but was unsuccessful.
Since the cluster was unusable, I reinstalled the cluster, then stopped the cluster, mounted the disks, and executed pd-recovery on one machine successfully. However, only this pd could start, and the others could not.

| username: caiyfc | Original post link

You can try to fix it based on this post: 【SOP Series 12】TiUP Modify Cluster IP, Port, and Directory - :milky_way: Operations Guide / TiDB Operations Manual - TiDB Q&A Community (asktug.com)

| username: 大飞哥online | Original post link

New IP, need to change the configuration file.

| username: tidb菜鸟一只 | Original post link

This is a bit troublesome. My suggestion is that if you can change the IP, directly release the IPs of the two abnormal machines, change the IPs of the new machines to the IPs of the original two machines, and mount the disks with the same IPs as before. Then, try to start the cluster directly to see if it can come up. If you can’t change the IP, you can only do it according to the cluster migration IP, and PD needs to be rebuilt.

| username: ffeenn | Original post link

I have encountered a similar issue in my testing environment. You can refer to this and see if it helps you.
Network Failure Collective Migration IP Failure Recovery Process - :milky_way: Operations Guide / TiDB Common FAQ - TiDB Q&A Community (asktug.com)

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.