Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 扩容问题
[Test Environment for TiDB] Testing
[TiDB Version] 6.5.1
[Reproduction Path] Adding a PD
[Encountered Problem: Phenomenon and Impact] When executing SCALE-OUT, an error occurs: “error”: “no endpoint available, the last err was: Get "http://192.168.46.101:2379/pd/api/v1/config/replicate\”: dial tcp 192.168.46.101:2379: connect: connection refused
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Could you please share the cluster topology and the configuration file for scaling out?
At the beginning, there were 3 PD servers. Yesterday, two of the PD servers failed simultaneously, so we prepared to add another PD server to get the cluster running again.
Does it mean that the cluster is not in an up state now?
It looks like your cluster is in a stopped state, so you can’t scale it. You need to start the cluster first, and then scale the PD nodes.
Currently, there is only one PD left. I have successfully removed the two problematic PDs by performing a SCALE-IN. However, when starting the cluster, the TIKV nodes still try to connect to a PD that no longer exists, so none of the TIKV nodes can start.
When you originally scaled down, the cluster was in a running state, right?
Also, start it first, there’s no way to fix it without starting it.
Expand while keeping one PD online.
Could you please share the error message for us to take a look?
Let’s see if a single PD can start successfully.
The probability is high. Check the PD address specified in the run_tikv.sh file under the deployment directory of TiKV and modify it to the current one. If it still doesn’t work, you might need to use pd-recover for recovery.