Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 服务器因为安全升级要重启,集群tiup部署的,服务器的重启有顺序要求吗?如果一台一台的重启是否可以不影响集群正常运行
The server needs to restart to take effect due to a security upgrade. The TiDB cluster is deployed using tiup. The environment is limited, with 2 PD and TiDB on the same host, and the remaining 1 PD and 3 TiKV deployed separately. If the servers are restarted one by one and wait for each to successfully restart, will it not affect the normal operation of the cluster? If all are restarted, will the cluster auto-start? Do I need to ensure the restart order of the servers?
You can first stop the service on the specified machine, confirm that there are no issues, and then restart the machine.
First, stop the cluster, wait for the machine to restart, and then start the cluster.
If you stop the cluster yourself, will it not start automatically after the machine restarts? Do you need to run tiup cluster xx start
again?
The suggestion is to stop the cluster before restarting the service.
Theoretically, with this configuration, stopping and restarting the service one by one should not affect the normal operation of the cluster.
What you’re considering is correct; I forgot that it would automatically start. To disable and restart automatic startup, you can use this command: tiup cluster disable | PingCAP Documentation Center
To enable automatic startup, use this: tiup cluster enable | PingCAP Documentation Center
“3 TiKV nodes deployed separately” means that each TiKV is on an independent machine, so restarting the machine doesn’t really have any impact. If they are all on one machine, it is still recommended to stop the cluster first.
If a full restart is not possible, you can actually restart one machine at a time.
With 2 PD and TiDB on the same host, and the remaining 1 PD and 3 TiKV deployed separately (with three replicas of TiKV) — assuming the environment is like this:
- PD/TiDB-server
- PD/TiDB-server
- PD
- TiKV
- TiKV
- TiKV
You can restart one machine at a time in the order of 1/2/3/4/5/6, but you must ensure that after a machine restarts, the node is functioning normally before restarting other machines. For example, after restarting 1, you need to ensure that both PD and TiDB-server on 1 are functioning normally before restarting 2. For 4/5/6, you can only restart one at a time, and you must wait for the TiKV on the restarted node to start up completely before proceeding to the next node.
Thank you very much. The control center is on the first pd/tidb-server, and the cluster was initially deployed and started through tiup. According to this order, can we avoid stopping components on the control center (tiup cluster stop xx -R pd) and just restart the servers directly? After the restart, observe the cluster status, and once each component is confirmed to be online, proceed with the subsequent servers. Do at least two out of the three tikv servers need to be online? According to the friend’s reply above, if restarting together, it’s best to ensure that the pd servers start first, and when tikv starts, pd should be online.
Execute the stop node operation on the control center (tiup cluster stop xx -N 1.1.1.1:7390) — After stopping, confirm the node status. This is to restart the node on the corresponding machine, not to restart -R pd, as this would stop all pd instances. For example, if you have 3 pd instances and stop one, the other pd instances will automatically become the pd leader. After stopping the pd on this node and seeing that the pd on other nodes is normal, you can restart the machine corresponding to this pd.
For tidb-server, as long as there is at least one up node, it can provide external services.
For tikv, which generally has three replicas, you can check the max-replica parameter. With three replicas, at least two must be online to provide normal external services, so you must ensure that at least two nodes are online.
If you are restarting the entire cluster, it is recommended to follow the advice of the previous colleague: first disable automatic startup, stop the cluster, and then restart the machines. This is because the tidb cluster has a specific restart order: pd→tikv→tidb. If the machines start in sequence, the automatic startup of the tidb cluster may not succeed. When starting the cluster, you don’t need to start each role individually. You can wait until all machines have restarted and then use tiup cluster start <cluster_name> to start the cluster. However, the prerequisite for this is that all machines have completed their startup. If the machine with the pd node hasn’t finished starting, the cluster won’t be able to start.
One more thing, will the components automatically start after a single server restarts? Or do we still need to manually execute tiup cluster start xx -N 1.2.3.4:2379
? Auto-start should be able to automatically bring up the components on the node, right?
The components installed by tiup are all registered as services and will start with the server.
Yes, actually when you install the TiDB cluster, each machine generates a unit like this tidb-4000.service, and they are all configured to start automatically with a command like systemctl enable tidb-4000.service. When you restart the machine, it will automatically restart the components…
Restart PD and TiDB one by one.
The sequence can actually be referenced here:
The cluster startup operation will start all components of the entire TiDB cluster in the order of PD → TiKV → Pump → TiDB → TiFlash → Drainer → TiCDC → Prometheus → Grafana → Alertmanager.
The cluster shutdown operation will shut down all components of the entire TiDB cluster in the order of Alertmanager → Grafana → Prometheus → TiCDC → Drainer → TiFlash → TiDB → Pump → TiKV → PD (and will also shut down the monitoring components).