TiDB Offline Deployment Environment Check SSH Error

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 离线部署检查环境 SSH 报错

| username: fish

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.3
[Reproduction Path]

  1. Regular user executes pre-installation check command $ tiup cluster check ./topology.yaml --user jhdcp --ssh system

Error: Error: failed to fetch cpu-arch or kernel-name: executor.ssh.execute_failed: Failed to execute command over SSH for ‘jhdcp@xx.xx.xx.122:22’ {ssh_stderr: Connection timed out during banner exchange, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c “uname -s”}, cause: exit status 255

  1. However, I checked that SSH passwordless execution is normal

[Encountered Problem: Problem Phenomenon and Impact]
How should this be handled?

[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]

  1. Error log:
    tiup-cluster-debug-2023-08-29-11-59-46.log.txt (38.3 KB)

  2. topology.yaml file
    topology.yaml (3.1 KB)

| username: zhanggame1 | Original post link

If the deployment is not done by the root user, in addition to password-free access, all sudo permissions must be added.

| username: fish | Original post link

I have already checked, and they all have sudo permissions. As shown in the screenshot above, using sudo uname -s can retrieve the kernel parameters.

| username: redgame | Original post link

Move the contents of the /etc/ssh/ directory to the tmp directory and rerun.

| username: Kongdom | Original post link

121 and 122 need to establish mutual trust. You can try SSH to 122 to see if a password is required.

Log in to the control machine as the tidb user and use ssh to log in to the target machine’s IP. If you can log in successfully without entering a password, it means the SSH mutual trust configuration is successful.

| username: zhanggame1 | Original post link

It seems to be partially successful and partially failed. Check the differences between the failed machines and the successful machines.

| username: fish | Original post link

This operation is too dangerous, even SSH will be affected.

| username: fish | Original post link

SSH mutual trust has been successfully configured.

| username: fish | Original post link

Yes, and after executing multiple times, the machines that fail are different. For example, 122 and 123 fail, and after executing again, it might be 122 and 125 that fail.

| username: Kongdom | Original post link

:sweat_smile: This is some advanced syntax, I can’t understand it~~~

Can you verify it with the official documentation?

| username: MrSylar | Original post link

When an SSH connection encounters the error “connection timed out during banner exchange” during the handshake phase, it is usually due to one of the following reasons:

  1. Network Connection Issues: There may be network connection failures or blockages preventing the establishment of the SSH connection. You can try checking if the network connection is normal and ensure smooth communication between the server and the client.
  2. Firewall or Security Group Configuration: There may be configuration issues with the firewall or security group that are blocking the establishment of the SSH connection. Ensure that the firewall or security group settings allow SSH connection traffic and verify your network security policies.
  3. SSH Service Configuration Issues: There may be configuration issues with the SSH service, such as incorrect port settings or invalid SSH configuration parameters. Check the SSH server’s configuration file (usually /etc/ssh/sshd_config) to ensure that the port number, authentication options, and other settings are correct.
  4. SSH Server Issues: There may be issues with the SSH server itself, such as resource limitations, service anomalies, or other problems causing the SSH connection to fail. You can try restarting the SSH server and checking the system logs for more detailed information.

When debugging this issue, it is recommended to use some tools and methods to resolve it. You can try using the ping command to check the network connection, the telnet command to test the accessibility of the SSH port, and view system log files for more error information. If the problem persists, it is advisable to consult a system administrator or network expert for more detailed help and support.

| username: Fly-bird | Original post link

It looks like a permission issue.

| username: 我是咖啡哥 | Original post link

Why do so many people encounter SSH issues? I always use a regular user with sudo privileges to directly use the automatically configured SSH from check apply, and I’ve never had any problems. :grinning:

| username: Kongdom | Original post link

:sweat_smile: As someone who is encountering CentOS for the first time because of TiDB, I am indeed quite confused in many areas.

| username: TiDB_C罗 | Original post link

Check if sshd_config allows root.