[Resolved After Reinstalling the Operating System] Error: failed to start tidb: failed to start:

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【自己重新安装操作系统后解决】Error: failed to start tidb: failed to start:

| username: 大炮的TiDB

【TiDB Environment】\Test Environment
【TiDB Version】5.4.1
【Encountered Problem】
【Reproduction Path】What operations were performed to cause the problem
【Problem Phenomenon and Impact】
[tidb@T1018666 ~]$ tiup cluster display tidb-test
tiup is checking updates for component cluster …
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.10.2/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v5.4.1
Deploy user: tidb
SSH type: builtin
Grafana URL: http://10.22.231.124:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


10.22.231.124:9093 alertmanager 10.22.231.124 9093/9094 linux/x86_64 Down /tidb-data/alertmanager-9093 /tidb-deploy/alertmanager-9093
10.22.231.124:3000 grafana 10.22.231.124 3000 linux/x86_64 Down - /tidb-deploy/grafana-3000
10.22.231.226:2379 pd 10.22.231.226 2379/2380 linux/x86_64 Down /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.22.231.227:2379 pd 10.22.231.227 2379/2380 linux/x86_64 Down /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.22.231.228:2379 pd 10.22.231.228 2379/2380 linux/x86_64 Down /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.22.231.124:9090 prometheus 10.22.231.124 9090/12020 linux/x86_64 Down /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
10.22.231.229:4000 tidb 10.22.231.229 4000/10080 linux/x86_64 Down - /tidb-deploy/tidb-4000
10.22.231.230:4001 tidb 10.22.231.230 4001/10081 linux/x86_64 Down - /tidb-deploy/tidb-4001
10.22.231.231:4000 tidb 10.22.231.231 4000/10080 linux/x86_64 Down - /tidb-deploy/tidb-4000
10.22.231.124:9000 tiflash 10.22.231.124 9000/8123/3930/20170/20292/8234 linux/x86_64 N/A /data1/tidb-data/tiflash-9000 /data1/tidb-deploy/tiflash-9000
10.22.231.86:20160 tikv 10.22.231.86 20160/20180 linux/x86_64 N/A /data1/tidb-data/tikv-20160 /data1/tidb-deploy/tikv-20160
10.22.231.87:20161 tikv 10.22.231.87 20161/20181 linux/x86_64 N/A /data2/tidb-data/tikv-20161 /data2/tidb-deploy/tikv-20161
10.22.231.92:20160 tikv 10.22.231.92 20160/20180 linux/x86_64 N/A /data1/tidb-data/tikv-20160 /data1/tidb-deploy/tikv-20160

Error: failed to start tidb: failed to start: 10.22.231.231 tidb-4000.service, please check the instance’s log(/tidb-deploy/tidb-4000/log) for more detail.: timed out waiting for port 4000 to be started after 2m0s

Please help, TiDB has only been successfully installed once, and after upgrading, it won’t start. Reinstallation also failed!

| username: 大炮的TiDB | Original post link

This is the returned bug log information.

| username: 大炮的TiDB | Original post link

When I first started the deployment, I used the detection function to check, and everything was normal. The deployment could proceed to the end, but it failed during initialization. It’s so difficult.

| username: 大炮的TiDB | Original post link

Both pod and tikv can start, but tidb, alertmanager, grafana, and prometheus cannot start. Please help!

| username: db_user | Original post link

It should be a port occupation issue, or the port is not accessible. You can check if there are already running services on port 4000 or port 9100 on the TiDB machine. Additionally, check if the firewall is enabled. Also, it’s better to anonymize the information in the logs, such as IP addresses, for security reasons.

| username: 大炮的TiDB | Original post link

Thank you. It has already been desensitized. The firewall is turned off, and the TiDB service is running locally.

| username: db_user | Original post link

Normally, the 4000 port will not start without a start command. When you start it, if it finds that the 4000 port is occupied, it will fail to start. This is my understanding. In a new environment, you can check how TiDB is started above, kill it, or stop it with systemctl, and then use tiup to start TiDB and see. The --wait-timeout option can increase the startup duration.

| username: 大炮的TiDB | Original post link

Thank you for your reply. After reinstalling the operating system and running it again, everything is normal. Reflecting on the operations, it wasn’t an operational issue. When I checked the node that couldn’t start, the services were all normal. It was only in the tiup server that the status was Down. Before that, I had used the --check parameter to check the configuration, and it ran through without issues. It seems the success rate depends on luck (which is quite frustrating).

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.