What is the correct procedure for scaling down PD nodes and monitoring node servers, and restarting the servers in the cluster?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: PD节点和监控节点服务器缩容,需要重启服务器,集群正确操作是?

| username: 我是咖啡哥

[TiDB Usage Environment] Production Environment
[TiDB Version] V6.1.1
[Problem Description]
Recently, due to cost control, resources with low usage in production need to be scaled down. We plan to scale down the monitoring nodes from 16c32G to 8c16g and the PD nodes from 8c16g to 4c8g.
The server needs to be restarted for the changes to take effect after scaling down.
What is the correct procedure for the TiDB cluster?
Option 1: Do not stop the cluster
Do not stop the cluster, restart the servers node by node (monitoring nodes, PD nodes).
Will the services automatically come back up after the server restarts? Are there any risks?
Option 2: Stop the cluster
Manually stop the cluster first, then restart the servers, and finally start the cluster again after the server restarts.

Experienced friends, please provide some guidance!

| username: 裤衩儿飞上天 | Original post link

In production, I think it’s better to be conservative.
Expand new nodes - shrink old nodes.

| username: 我是咖啡哥 | Original post link

The process of applying for resources is quite complicated, which is unlikely. The original intention was to reduce costs. :sweat_smile:

| username: h5n1 | Original post link

Shrink one at a time, stop each one as you go.

| username: 裤衩儿飞上天 | Original post link

Then just go ahead and stop them one by one :face_with_peeking_eye:
As long as it’s not a power outage, normal shutdowns are generally fine.

| username: 啦啦啦啦啦 | Original post link

Stop one by one and shrink one by one, it’s not a big problem. Anyway, it doesn’t involve data and should be quick.

| username: 我是咖啡哥 | Original post link

All the preparations are done, and now we are not scaling down. We bought it with vouchers, and scaling down won’t refund the money. Awkward :joy:

| username: Kongdom | Original post link

:+1: :+1: :+1: Truly envious.
I think you should first use tiup to stop the corresponding node, and then restart the corresponding node server.

| username: ffeenn | Original post link

I have scaled down the configuration online, including the monitoring node and TiKV. First, I stopped and restarted the monitoring node using tiup, then stopped and restarted TiKV directly. It didn’t have much impact on the cluster, and I did it at 5 AM. :rofl:

| username: 人如其名 | Original post link

I feel that stopping PD and letting the system automatically switch the PD leader due to timeout will still affect the overall business. I wonder if there is a way to switch the PD leader directly using a command, which should have minimal impact on the business.

| username: Raymond | Original post link

You can specify it.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.