How to Re-manage an Existing Cluster After Reinstalling the Control Machine System in TiDB

translator_bot · June 23, 2024, 5:20am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb中控机系统损坏重装系统后如何重新管理现有集群

| username: hyman

【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】5.4
【Problem Phenomenon and Impact】The control machine system crashed and after redoing the system, it is unable to manage the existing cluster. The monitoring deployed on the control machine needs to be restored to manage the existing cluster and reinitialize the monitoring system separately.

translator_bot · June 23, 2024, 5:20am

| username: 啦啦啦啦啦 | Original post link

There are quite a few similar posts, you can search and refer to them. Additionally, it is recommended to regularly back up tiup.

After restoring tiup, you can use tiup to expand nodes to install monitoring, etc.

translator_bot · June 23, 2024, 5:20am

| username: hyman | Original post link

Deploy won’t overwrite the current cluster and data, right?

translator_bot · June 23, 2024, 5:20am

| username: cs58_dba | Original post link

I feel that the central control machine also needs to be made highly available (HA).

translator_bot · June 23, 2024, 5:20am

| username: wuxiangdong | Original post link

Backup the tiup directory on the control machine and copy it to the new control machine.

translator_bot · June 23, 2024, 5:20am

| username: 啦啦啦啦啦 | Original post link

Data will not be overwritten. Since the current tiup does not have the cluster’s metadata, there will be no directory or port conflicts during deployment. The deploy command will download the specified version of the binary files and overwrite the original cluster. (Because this is an unconventional deployment, overwriting is understandable as a normal operation. The upgrade command will back up the bin directory.)

translator_bot · June 23, 2024, 5:20am

| username: hyman | Original post link

Currently, the cluster is in a running state. When performing a deploy operation, do I need to stop the cluster first, deploy, and then start it again, or can I just keep it as it is?

translator_bot · June 23, 2024, 5:20am

| username: wuxiangdong | Original post link

If the tiup directory is not backed up, it might be a bit awkward. You can create a tiup directory yourself and then modify the files in the directory based on the information of each node.

translator_bot · June 23, 2024, 5:20am

| username: 啦啦啦啦啦 | Original post link

There is no impact on the running service. I found this FAQ, which has more detailed steps:

TiDB 的问答社区 – 10 Jul 20

[FAQ] .tiup 等元信息被删除恢复办法

🌌 运维指南 TiDB 常见 FAQ

明确下本次恢复的目的，是恢复 .tiup 中的元数据，此为管理集群的基础。有个这些元数据，新的 tiup 将会继续运维以前的集群恢复步骤手写一下最终的集群 topo 文件，需要批量将 instance 级别的 bin/{instance}-server 文件 mv ，解释可看 [2] 根据 tiup 部署集群步骤，进行 deploy 操作，解释可看 [3] [2] 因为使用已发布的 tiup 进行部署，需要覆盖 instance 级别的 binary...

阅读时间: 1 mins 🕑 赞: 2 ❤

translator_bot · June 23, 2024, 5:20am

| username: wuxiangdong | Original post link

Deployment won’t work, it generates files on each node.