After implementing the separation of powers, the TiDB cluster fails to start despite reconfiguring mutual trust

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 系统做了三权分立之后,tidb集群无法启动,已重新配置互信

| username: TiDBer_m6jfaJ6v

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 6.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

Error Description
Error: failed to start pd: failed to start: 10.0.55.26 pd-2379.service, please check the instance’s log(/tidb-deploy/pd-2379/log) for more detail.: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.0.55.26:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c “systemctl daemon-reload && systemctl start pd-2379.service”}, cause: ssh: handshake failed: write tcp 10.0.55.23:39562->10.0.55.26:22: write: permission denied


| username: TiDBer_m6jfaJ6v | Original post link

| username: Kongdom | Original post link

Please share the /tidb-deploy/pd-2379/log on 10.0.55.26 for review.

failed to start: 10.0.55.26 pd-2379.service, please check the instance’s log(/tidb-deploy/pd-2379/log) for more detail

| username: 像风一样的男子 | Original post link

Can you log in to the 55.25 server using the TiDB account? Do you have sudo privileges?

| username: TiDBer_m6jfaJ6v | Original post link

The root user on the operating system currently doesn’t have the su or sudo command. It was running normally before, but after configuring the separation of powers, it cannot start. Help~

| username: TiDBer_m6jfaJ6v | Original post link

No sudo command,
-bash: sudo: command not found

| username: TiDBer_m6jfaJ6v | Original post link

This

| username: Kongdom | Original post link

Is there no error-level error message?

| username: 像风一样的男子 | Original post link

Did you perform some operation to delete sudo?
yum install sudo

| username: Fly-bird | Original post link

TiDB doesn’t have permissions. Please add sudo permissions for TiDB on the server.

| username: TiDBer_m6jfaJ6v | Original post link

The separation of powers has been disabled, sudo has been restored, and the same error occurs.

| username: TiDBer_m6jfaJ6v | Original post link

I have added sudo, but the same error occurs.

| username: TiDBer_m6jfaJ6v | Original post link

The image cannot be translated directly. Please provide the text content for translation.

| username: Jolyne | Original post link

You can start the other nodes in order first, then start the TiKV nodes one by one, leaving the problematic one for last (to ensure the cluster runs first). If possible, take this problematic node offline and add a new machine to the cluster first, then reconfigure the environment on the problematic machine.

| username: Kongdom | Original post link

It’s probably still a permissions issue :thinking:

| username: TiDBer_m6jfaJ6v | Original post link

Thank you, expert. I’ll study it further.

| username: TiDBer_m6jfaJ6v | Original post link

Thank you, expert.

| username: 路在何chu | Original post link

First, on the target machine, under the tidb user, execute sudo to create a file in the /root directory and test it.

| username: TiDBer_m6jfaJ6v | Original post link

There doesn’t seem to be a problem.

| username: Kongdom | Original post link

Refer to this and see if it helps?