Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 系统做了三权分立之后,tidb集群无法启动,已重新配置互信
[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 6.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Error Description
Error: failed to start pd: failed to start: 10.0.55.26 pd-2379.service, please check the instance’s log(/tidb-deploy/pd-2379/log) for more detail.: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.0.55.26:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c “systemctl daemon-reload && systemctl start pd-2379.service”}, cause: ssh: handshake failed: write tcp 10.0.55.23:39562->10.0.55.26:22: write: permission denied
Please share the /tidb-deploy/pd-2379/log on 10.0.55.26 for review.
failed to start: 10.0.55.26 pd-2379.service, please check the instance’s log(/tidb-deploy/pd-2379/log) for more detail
Can you log in to the 55.25 server using the TiDB account? Do you have sudo privileges?
The root user on the operating system currently doesn’t have the su or sudo command. It was running normally before, but after configuring the separation of powers, it cannot start. Help~
No sudo command,
-bash: sudo: command not found
Is there no error-level error message?
Did you perform some operation to delete sudo?
yum install sudo
TiDB doesn’t have permissions. Please add sudo permissions for TiDB on the server.
The separation of powers has been disabled, sudo has been restored, and the same error occurs.
I have added sudo, but the same error occurs.
The image cannot be translated directly. Please provide the text content for translation.
You can start the other nodes in order first, then start the TiKV nodes one by one, leaving the problematic one for last (to ensure the cluster runs first). If possible, take this problematic node offline and add a new machine to the cluster first, then reconfigure the environment on the problematic machine.
It’s probably still a permissions issue
Thank you, expert. I’ll study it further.
First, on the target machine, under the tidb user, execute sudo to create a file in the /root directory and test it.
There doesn’t seem to be a problem.
Refer to this and see if it helps?