username: TiDBer_Terry261

[Test Environment for TiDB] Testing
[TiDB Version] 6.5.1
[Reproduction Path] Adding a PD
[Encountered Problem: Phenomenon and Impact] When executing SCALE-OUT, an error occurs: “error”: “no endpoint available, the last err was: Get "\”: dial tcp connect: connection refused
[Resource Configuration]
username: CuteRay

Could you please share the cluster topology and the configuration file for scaling out?

username: TiDBer_Terry261

At the beginning, there were 3 PD servers. Yesterday, two of the PD servers failed simultaneously, so we prepared to add another PD server to get the cluster running again.

username: TiDBer_Terry261

topology.yaml (11.1 KB)

username: Kongdom

Does it mean that the cluster is not in an up state now?

username: CuteRay

It looks like your cluster is in a stopped state, so you can’t scale it. You need to start the cluster first, and then scale the PD nodes.

username: TiDBer_Terry261

Currently, there is only one PD left. I have successfully removed the two problematic PDs by performing a SCALE-IN. However, when starting the cluster, the TIKV nodes still try to connect to a PD that no longer exists, so none of the TIKV nodes can start.

username: CuteRay

When you originally scaled down, the cluster was in a running state, right?
Also, start it first, there’s no way to fix it without starting it.

username: tidb菜鸟一只

Expand while keeping one PD online.

username: Kongdom

Could you please share the error message for us to take a look?

username: 考试没答案

Display the status.

username: 考试没答案

Let’s see if a single PD can start successfully.

username: h5n1

The probability is high. Check the PD address specified in the file under the deployment directory of TiKV and modify it to the current one. If it still doesn’t work, you might need to use pd-recover for recovery.