Prometheus Scaling Failure

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: prometheus扩容失败

| username: TiDBer_iLonNMYE

[TiDB Usage Environment] Testing
[TiDB Version] V5.4.3
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
When deploying the TiDB cluster, there was a port conflict between Prometheus’s 9090 port and the SYSTEMD process port of Kylin V10. Since systemd is the first process, it cannot be killed. After deployment, I attempted to change Prometheus’s port by scaling in and then scaling out.
Scaling in was smooth, command: tiup cluster scale-in demo -N :9090
Scaling out encountered an error:
cat prometheus_9099.yaml
monitoring_servers:
- host:
port: 9099

tiup cluster scale-out demo prometheus_9099.yaml
Error: none of ssh password, identity file, SSH_AUTH_SOCK specified(tui.id_read_failed)

| username: tidb菜鸟一只 | Original post link

Are the other components fine? This looks like the system doesn’t support it?

| username: TiDBer_iLonNMYE | Original post link

I have updated the error message again, please take a look.
Operating System: Kylin V10, CPU: Hygon, TiDB v5.4.3

| username: TiDBer_iLonNMYE | Original post link

Using -p to provide the password, the execution result is:
+Detect CPU Arch Name:

  • detecting node Arch info … Error
    Error: failed to fetch cpu-arch or kernel-name: executor.ssh.execute_failed: Failed to execute command over SSH for ‘’ …
    cause: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain
| username: xfworld | Original post link

This is an authentication failure

cause: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain

| username: TiDBer_iLonNMYE | Original post link

However, the password is correct. If possible, the official team can verify in the lab by scaling in and then scaling out Prometheus.

Since it was a newly created environment, I directly destroyed the cluster and redeployed it. Now Prometheus is working fine.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.