TiUP Cluster Deployment Stuck

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiUP cluster deploy 卡住

| username: chy1013m1

【TiDB Usage Environment】Production Ubuntu 22.04 LTS
【TiDB Version】v6.1.0
【Encountered Problem】
During tiup cluster deploy, it often gets stuck out of sync. Sometimes during copy, mkdir

I have tried tiup --ssh system

Also confirmed that the control machine SSH can interconnect, keyless SSH is also available

LOG ::

signal: killed”, “errorVerbose”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@172.21.32.155:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c “test -d /data/tidb/tidb-deploy || (mkdir -p /data/tidb/tidb-deploy && chown tidb:$(id -g -n tidb) /data/tidb/tidb-deploy)”}, cause: signal: killed

2022-07-14T14:22:03.590-0400 DEBUG TaskFinish {“task”: “Mkdir: host=172.21.32.155, directories=‘/data/tidb/tidb-deploy/monitor-19100’,‘/data/tidb/tidb-data/monitor-19100’,‘/data/tidb/tidb-deploy/monitor-19100/log’,‘/data/tidb/tidb-deploy/monitor-19100/bin’,‘/data/tidb/tidb-deploy/monitor-19100/conf’,‘/data/tidb/tidb-deploy/monitor-19100/scripts’”, “error”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@172.21.32.155:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c "test -d /data/tidb/tidb-deploy || (mkdir -p /data/tidb/tidb-deploy && chown tidb:$(id -g -n tidb) /data/tidb/tidb-deploy)"}, cause: signal: killed”, “errorVerbose”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@172.21.32.155:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c "test -d /data/tidb/tidb-deploy || (mkdir -p /data/tidb/tidb-deploy && chown tidb:$(id -g -n tidb) /data/tidb/tidb-deploy)"}, cause: signal: killed
at github.com/pingcap/tiup/pkg/cluster/executor.(*NativeSSHExecutor).Execute()
\tgithub.com/pingcap/tiup/pkg/cluster/executor/ssh.go:338
at github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute()
\tgithub.com/pingcap/tiup/pkg/cluster/executor/checkpoint.go:85
at github.com/pingcap/tiup/pkg/cluster/task.(*Mkdir).Execute()
\tgithub.com/pingcap/tiup/pkg/cluster/task/mkdir.go:61
at github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute()
\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86
at github.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute()
\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111
at github.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1()
\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144
at runtime.goexit()
\truntime/asm_amd64.s:1571
github.com/pingcap/errors.AddStack
\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tiup/pkg/cluster/task.(*Mkdir).Execute
\tgithub.com/pingcap/tiup/pkg/cluster/task/mkdir.go:63
github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute
\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86
github.com/pingcap/tiup/pkg/cluster/task.(*StepDisplay).Execute
\tgithub.com/pingcap/tiup/pkg/cluster/task/step.go:111
github.com/pingcap/tiup/pkg/cluster/task.(*Parallel).Execute.func1
\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:144
runtime.goexit
\truntime/asm_amd64.s:1571”}

| username: songxuecheng | Original post link

  1. https://asktug.com/t/topic/95777 Refer to this
  2. Is the firewall enabled?
| username: chy1013m1 | Original post link

Hi, we are all using the “tidb” user. There is no problem SSHing from the control machine with tidb@172.21.32.155, but tiup just gets stuck.

| username: hey-hoho | Original post link

Have all nodes configured sudo permissions for the tidb user?

| username: chy1013m1 | Original post link

Yes, the tidb user can sudo ::
image

| username: songxuecheng | Original post link

Try configuring it manually.

| username: chy1013m1 | Original post link

Yes, keyless SSH is configured.

| username: Hi70KG | Original post link

  1. Check if there is sufficient space on each node.
  2. Ensure mutual trust between the root user and the tidb user on each node.
| username: chy1013m1 | Original post link

Thanks, I have checked everything. Currently, it seems more like SSH randomly not responding. For example, during deployment, there are three TiKV nodes: .123, .124, and .125. Sometimes .123 gets stuck, sometimes .124 or .125, randomly. I have canceled the deployment several times with Ctrl+^C, but it always fails.

Once, I left it in a screen session overnight, and it was still stuck at the mkdir deploy step, spinning.

It currently looks more like SSH randomly not responding.

| username: cs58_dba | Original post link

I feel like we can make a big move and try switching to the CentOS 7 operating system.

| username: ealam_小羽 | Original post link

Could it be that the SSH version on the machine is different from the default SSH version of TiUP? Refer to the document below and try adding a parameter to specify the system version?

| username: chy1013m1 | Original post link

I have also tried this (–ssh system) and it randomly gets stuck.
I have also tried -c 1 ~ 10 and it still gets stuck.

Thanks.

| username: ablewang_xiaobo | Original post link

It is recommended to use tiup cluster check ./*.yaml before deployment to check if there are any failed options. If there are, they need to be fixed.

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.