SSH error when starting tiup

translator_bot · June 23, 2024, 4:40am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiup启动时报SSH错误

| username: TiDBer_8rWAgqMU

When using tiup to start the TiDB cluster, an error occurred (there was no error during installation):
Error: failed to start prometheus: failed to start: slave007 prometheus-9090.service, please check the instance’s log(/home/tidb/.tiup/tidb-deploy/prometheus-9090/log) for more detail.: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@slave007:22’ {ssh_stderr: Failed to start prometheus-9090.service: Unit not found.
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c “systemctl daemon-reload && systemctl start prometheus-9090.service”}, cause: Process exited with status 5

Prerequisite: I have set up passwordless SSH for the tidb user and root user on both the control machine and the TiDB cluster machines. As shown below, it did not ask me for a password:
[tidb@slave006 bin]$ ssh tidb@slave007
Last login: Tue Sep 6 10:40:09 2022
[tidb@slave007 ~]$

How can I solve this problem?

translator_bot · June 23, 2024, 4:40am

| username: wuxiangdong | Original post link

Should the sudo privileges be added to the tidb account if the root account does not use passwordless authentication?

translator_bot · June 23, 2024, 4:40am

| username: wuxiangdong | Original post link

Try running sudo systemctl status prometheus-9090.service on your TiDB account slave007 to see if it works.

translator_bot · June 23, 2024, 4:40am

| username: TiDBer_8rWAgqMU | Original post link

Should it be added on the central control machine (slave006) or on slave007?
[tidb@slave007 root]$ sudo -ll
Matching Defaults entries for tidb on slave007:
!visiblepw, always_set_home, match_group_by_gid, always_query_group_plugin, env_reset, env_keep=“COLORS DISPLAY HOSTNAME HISTSIZE KDEDIR LS_COLORS”, env_keep+=“MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE”,
env_keep+=“LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES”, env_keep+=“LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE”, env_keep+=“LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY”,
secure_path=/sbin:/bin:/usr/sbin:/usr/bin

User tidb may run the following commands on slave007:

Sudoers entry:
RunAsUsers: ALL
Options: !authenticate
Commands:
ALL

translator_bot · June 23, 2024, 4:40am

| username: TiDBer_8rWAgqMU | Original post link

[tidb@slave007 .tiup]$ sudo systemctl status prometheus-9090.service
Unit prometheus-9090.service could not be found.

translator_bot · June 23, 2024, 4:40am

| username: TiDBer_8rWAgqMU | Original post link

Additionally, I was just installing DM, and slave007 reported an SSH error:
Deploy TiDB instance

Copy dm-master → slave003 … Done
Copy dm-worker → slave004 … Done
Copy dm-worker → slave005 … Done
Copy dm-worker → slave006 … Done
Copy prometheus → slave007 … Error
Copy grafana → slave007 … Error
Copy alertmanager → slave003 … Done

Error: stderr: : executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@slave007:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin tar --no-same-owner -zxf /tidb-deploy-dm/prometheus-8249/bin/prometheus-v6.2.0-linux-amd64.tar.gz -C /tidb-deploy-dm/prometheus-8249/bin && rm /tidb-deploy-dm/prometheus-8249/bin/prometheus-v6.2.0-linux-amd64.tar.gz}, cause: Run Command Timeout

translator_bot · June 23, 2024, 4:40am

| username: wuxiangdong | Original post link

The deployment failed, so it couldn’t start successfully afterward.

translator_bot · June 23, 2024, 4:40am

| username: TiDBer_8rWAgqMU | Original post link

The deployment did not fail, it was successfully started before.
This deployment failure is related to DM, not TiDB.

translator_bot · June 23, 2024, 4:40am

| username: TiDBer_8rWAgqMU | Original post link

The entire process is as follows: I first deployed and installed TiDB, and there were no issues with starting it. Later, I installed DM, which was also successful, and there were no issues with starting it either. However, during the DM installation process, the monitoring part overlapped with TiDB’s monitoring on the same node, and the port was the same. So, I uninstalled DM, modified the monitoring port for DM, and then re-deployed and installed DM. As a result, after uninstalling DM, it seems that because the monitoring part overlapped with TiDB, TiDB’s monitoring component was also affected. Therefore, TiDB cannot start now (only the monitoring component cannot start).

translator_bot · June 23, 2024, 4:40am

| username: TiDBer_8rWAgqMU | Original post link

I used the command ./tiup dm destroy dm-test to uninstall DM.

translator_bot · June 23, 2024, 4:40am

| username: TiDBer_8rWAgqMU | Original post link

Strange, I just reinstalled and deployed DM again, and miraculously the DM deployment was successful and it started successfully. But TiDB is still reporting an error.

translator_bot · June 23, 2024, 4:40am

| username: Hacker_7Pu53geu | Original post link

The first time the DM monitoring component was deployed, it disrupted the TiDB monitoring module, causing the TiDB monitoring module to fail to start. You need to find the exact reason why the TiDB monitoring module cannot start and fix it. Alternatively, you can refer to this link tiup install | PingCAP 文档中心 and try to deploy only the monitoring component to attempt a fix.

translator_bot · June 23, 2024, 4:40am

| username: HACK | Original post link

Does the system have permission restrictions, and if so, do I need to enable sudo-related bash permissions?

translator_bot · June 23, 2024, 4:40am

| username: gary | Original post link

Did you deploy using the tidb user? It might be a permissions issue.

translator_bot · June 23, 2024, 4:40am

| username: HACK | Original post link

Judging by his operation records, he should be a TiDB user.

[tidb@slave006 bin]$ ssh tidb@slave007
Last login: Tue Sep 6 10:40:09 2022
[tidb@slave007 ~]$