TiDB Cluster Expansion: tiup cluster reload tidb-pro Some Nodes Abnormal

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb集群扩容tiup cluster reload tidb-pro部分节点异常

| username: johnnnyli

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 5.4.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
Error: init config failed: hdp010:2379: transfer from /root/.tiup/storage/cluster/clusters/tidb-pro/config-cache/pd-hdp010-2379.service to /tmp/pd_d50874cb-551b-46d6-85b5-80481e0f25ac.service failed: failed to scp /root/.tiup/storage/cluster/clusters/tidb-pro/config-cache/pd-hdp010-2379.service to tidb@hdp010:/tmp/pd_d50874cb-551b-46d6-85b5-80481e0f25ac.service: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

Passwordless mutual trust has been established between the root user and the tidb user, and both have sudo permissions. The scp command can also be executed manually, but it requires specifying the port number with -P9922 since the server port is not the default port 22. Not sure if this is related? Additionally, the issue occurs with roles on existing nodes, while new nodes and roles are normal.

| username: tidb菜鸟一只 | Original post link

Isn’t scaling out enough for expansion? Why do we need to reload? Is the SSH port on the newly deployed node also 9922, or is it only 9922 on the originally deployed node?

| username: 我是咖啡哥 | Original post link

If the port of the newly expanded server is different from the original one, you need to specify ssh_port under host.

| username: TiDBer_pkQ5q1l0 | Original post link

Is there a specified ssh_port under global configuration? Also, upgrade tiup and tiup cluster to the latest version.

| username: Jiawei | Original post link

Check if this is the default port 22, if so, change it.

| username: johnnnyli | Original post link

All node ports are 9922. During the final step of the scale-out process, the configuration file distribution for the old nodes failed, while it succeeded for the new nodes.

| username: johnnnyli | Original post link

The global configuration is set to 9922.

| username: Jiawei | Original post link

Please describe the specific error message and your operations.

| username: johnnnyli | Original post link

The operation is to execute the expansion command tiup cluster scale-out tidb-pro /data02/soft/tidb/tidbandtikv-scale-out.yaml. The installation was successful, but the configuration distribution failed on the old node.

| username: johnnnyli | Original post link

Error: init config failed: gwmidc010:2379: transfer from /root/.tiup/storage/cluster/clusters/tidb-pro/config-cache/pd-gwmidc010-2379.service to /tmp/pd_ae4ac8d1-b7f2-4ded-9697-020f048b0e3a.service failed:
failed to scp /root/.tiup/storage/cluster/clusters/tidb-pro/cg-cache/pd-gwmidc010-2379.service: ssh: handshake failed: ssh: unable to authenticate,
attempted methods [none publickey], no supported methods remain

| username: johnnnyli | Original post link

All are 9922

| username: Jiawei | Original post link

The error message still seems to be related to SSH. Confirm whether the corresponding keys have all been transferred. The troubleshooting direction is still SSH [none publickey].

| username: johnnnyli | Original post link

I’ll check it again, thank you.

| username: johnnnyli | Original post link

There is no problem with the public key, and both the root and tidb users can SSH without any issues. When distributing, is the public key used from the tiup directory (publicKey=/root/.tiup/storage/cluster/clusters/tidb-pro/ssh/id_rsa.pub) or the public key under the root user? I found that the key in the tiup directory is not the same as the public key under the user directory.

| username: xingzhenxiang | Original post link

Have you specified the user for scale-out? I usually scale out like this:

tiup cluster scale-out tidb-cname scale-out2023031601.yaml --user root -p

| username: johnnnyli | Original post link

If root is not specified, it should default to the current user. Does -p require a password input?

| username: xingzhenxiang | Original post link

I didn’t set up passwordless access here, so I’m using a password.

| username: cassblanca | Original post link

Reconfigure SSH mutual trust on each machine, then log in to each other in advance. Make sure the permissions for the keys are set correctly.