Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb中控机系统损坏重装系统后如何重新管理现有集群
【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】5.4
【Problem Phenomenon and Impact】The control machine system crashed and after redoing the system, it is unable to manage the existing cluster. The monitoring deployed on the control machine needs to be restored to manage the existing cluster and reinitialize the monitoring system separately.
There are quite a few similar posts, you can search and refer to them. Additionally, it is recommended to regularly back up tiup.
After restoring tiup, you can use tiup to expand nodes to install monitoring, etc.
Deploy won’t overwrite the current cluster and data, right?
I feel that the central control machine also needs to be made highly available (HA).
Backup the tiup directory on the control machine and copy it to the new control machine.
Data will not be overwritten. Since the current tiup does not have the cluster’s metadata, there will be no directory or port conflicts during deployment. The deploy command will download the specified version of the binary files and overwrite the original cluster. (Because this is an unconventional deployment, overwriting is understandable as a normal operation. The upgrade command will back up the bin directory.)
Currently, the cluster is in a running state. When performing a deploy operation, do I need to stop the cluster first, deploy, and then start it again, or can I just keep it as it is?
If the tiup directory is not backed up, it might be a bit awkward. You can create a tiup directory yourself and then modify the files in the directory based on the information of each node.
There is no impact on the running service. I found this FAQ, which has more detailed steps:
Deployment won’t work, it generates files on each node.