TiDB 7.5.1 Expansion: TiKV Node Fails to Start

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb7.5.1扩容tikv节点无法启动

| username: yytest

According to the official documentation, the expansion of the TiKV node failed.
Configuration expansion file:

cat <<EOF>/home/tidb/scale-out.yml
tikv_servers:
  - host: 192.168.2.22
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /data1/tidb-deploy/tikv-20160
    data_dir: /data1/tidb-data/tikv-20160
    log_dir: /data1/tidb-deploy/tikv-20160/log
EOF

Check:

tiup cluster check tidb-cluster /home/tidb/scale-out.yml --cluster --user root -p -i /root/.ssh/id_rsa

Fix:

tiup cluster check tidb-cluster /home/tidb/scale-out.yml --cluster --apply --user root -p -i /root/.ssh/id_rsa

Expand:

tiup cluster scale-out tidb-cluster /home/tidb/scale-out.yml --user root -p -i /root/.ssh/id_rsa

After starting the cluster and checking the topology status, the newly added tikv04 node is down. Please help me figure out how to locate the issue.

tiup cluster display tidb-cluster

Output:

Checking updates for component cluster... Timedout (after 2s)
Cluster type:       tidb
Cluster name:       tidb-cluster
Cluster version:    v7.5.1
Deploy user:        tidb
SSH type:           builtin
Dashboard URL:      http://192.168.2.15:2379/dashboard
Grafana URL:        http://192.168.2.20:3000
ID                  Role          Host          Ports                            OS/Arch       Status  Data Dir                            Deploy Dir
--                  ----          ----          -----                            -------       ------  --------                            ----------
192.168.2.20:9093   alertmanager  192.168.2.20  9093/9094                        linux/x86_64  Up      /data1/tidb-data/alertmanager-9093  /data1/tidb-deploy/alertmanager-9093
192.168.2.20:3000   grafana       192.168.2.20  3000                             linux/x86_64  Up      -                                   /data1/tidb-deploy/grafana-3000
192.168.2.14:2379   pd            192.168.2.14  2379/2380                        linux/x86_64  Up|L    /data1/tidb-data/pd-2379            /data1/tidb-deploy/pd-2379
192.168.2.15:2379   pd            192.168.2.15  2379/2380                        linux/x86_64  Up|UI   /data1/tidb-data/pd-2379            /data1/tidb-deploy/pd-2379
192.168.2.16:2379   pd            192.168.2.16  2379/2380                        linux/x86_64  Up      /data1/tidb-data/pd-2379            /data1/tidb-deploy/pd-2379
192.168.2.20:9090   prometheus    192.168.2.20  9090/12020                       linux/x86_64  Up      /data1/tidb-data/prometheus-9090    /data1/tidb-deploy/prometheus-9090
192.168.2.17:4000   tidb          192.168.2.17  4000/10080                       linux/x86_64  Up      -                                   /data1/tidb-deploy/tidb-4000
192.168.2.18:4000   tidb          192.168.2.18  4000/10080                       linux/x86_64  Up      -                                   /data1/tidb-deploy/tidb-4000
192.168.2.19:4000   tidb          192.168.2.19  4000/10080                       linux/x86_64  Up      -                                   /data1/tidb-deploy/tidb-4000
192.168.2.21:9000   tiflash       192.168.2.21  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      /data1/tidb-data/tiflash-9000       /data1/tidb-deploy/tiflash-9000
192.168.2.11:20160  tikv          192.168.2.11  20160/20180                      linux/x86_64  Up      /data1/tidb-data/tikv-20160         /data1/tidb-deploy/tikv-20160
192.168.2.12:20160  tikv          192.168.2.12  20160/20180                      linux/x86_64  Up      /data1/tidb-data/tikv-20160         /data1/tidb-deploy/tikv-20160
192.168.2.13:20160  tikv          192.168.2.13  20160/20180                      linux/x86_64  Up      /data1/tidb-data/tikv-20160         /data1/tidb-deploy/tikv-20160
192.168.2.22:20160  tikv          192.168.2.22  20160/20180                      linux/x86_64  Down    /data1/tidb-data/tikv-20160         /data1/tidb-deploy/tikv-20160
| username: Kamner | Original post link

Take a look at the logs of the scaled-out node. Is the node not configured as trusted?

| username: terry0219 | Original post link

Check the firewall.

| username: DBAER | Original post link

First, check the network and see if manual SSH is possible.

| username: xiaoqiao | Original post link

What is inside /data1/tidb-deploy/tikv-20160/log?

| username: tidb菜鸟一只 | Original post link

Have you configured mutual trust?

| username: TiDBer_QYr0vohO | Original post link

You can try to SSH into the newly added TiKV machine from the control machine.

| username: Jolyne | Original post link

I see that this is an internal network address. Either the firewall has set restrictions, or SSH is not configured properly. I suggest checking both.

| username: TiDBer_JUi6UvZm | Original post link

Will there be any warnings or errors after running check and apply?

| username: TiDBer_JUi6UvZm | Original post link

What error is reported in the new KV node log?

| username: madcoder | Original post link

Error logs and check results are all missing, just guessing blindly :sweat_smile:

| username: dba远航 | Original post link

What about the logs for the node expansion?

| username: 这里介绍不了我 | Original post link

:joy: Bro, could you provide some more information?

| username: shigp_TIDBER | Original post link

Logically, it shouldn’t be like this. Please post the entire log.

| username: TiDBer_RjzUpGDL | Original post link

Take a look at the logs.

| username: yytest | Original post link

After restarting the host, it worked fine. Thank you all for your replies.

| username: zhanggame1 | Original post link

Is it possible that TiKV was previously installed on this machine and the port is occupied?

| username: TiDBer_QYr0vohO | Original post link

Logs.

| username: jiayou64 | Original post link

Resources are too tight, and we have occasionally encountered this issue before when restarting.