TiDB Single Node Startup Exception

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 单节点启动异常

| username: 表渣渣渣

TiDB failed to start a single node
[TiDB Usage Environment] Production\Test Environment\POC Production
[TiDB Version] 5.4
[Problem Encountered] tiup cluster start tidb-slb -N xxxx.153:4000 failed to start a single node
[Reproduction Path] After executing the SQL script, the CPU ran full and the single node crashed, and subsequently failed to start the node
[Problem Phenomenon and Impact]

Error reported after executing the startup script:
root@:/data/tidb/tidb-deploy/tidb-4000/log# tiup cluster start tidb-slb -N xxxxx:4000
tiup is checking updates for component cluster …
A new version of cluster is available:
The latest version: v1.11.0
Local installed version: v1.10.1
Update current component: tiup update cluster
Update all components: tiup update --all

Starting component cluster: /root/.tiup/components/cluster/v1.10.1/tiup-cluster start tidb-slb -N xxxxx:4000
Starting cluster tidb-slb…

  • [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/tidb-slb/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/tidb-slb/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [Parallel] - UserSSH: user=tidb, host=xxxxx
  • [ Serial ] - StartCluster
    Starting component tidb
    Starting instance xxxxx:4000

Error: failed to start tidb: failed to start: xxxxx tidb-4000.service, please check the instance’s log(/data/tidb/tidb-deploy/tidb-4000/log) for more detail.: timed out waiting for port 4000 to be started after 2m0s

Verbose debug logs have been written to /root/.tiup/logs/tiup-cluster-debug-2022-10-12-17-03-19.log.

TiDB log error:
[2022/10/12 16:25:11.548 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=79.927587ms]
[2022/10/12 16:25:13.309 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=58.264043ms]
[2022/10/12 16:25:13.511 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=46.059301ms]
[2022/10/12 16:25:13.512 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=46.152008ms]
[2022/10/12 16:26:00.453 +08:00] [ERROR] [client.go:502] [“[pd] tso request is canceled due to timeout”] [dc-location=global] [error=“[PD:client:ErrClientGetTSOTimeout]get TSO timeout”]
[2022/10/12 16:27:50.612 +08:00] [ERROR] [client.go:786] [“[pd] getTS error”] [dc-location=global] [error=“[PD:client:ErrClientGetTSO]EOF: EOF”]
[2022/10/12 16:29:18.147 +08:00] [INFO] [client.go:730] [“[pd] tso stream is not ready”] [dc=global]
[2022/10/12 16:27:02.731 +08:00] [ERROR] [pd.go:236] [“updateTS error”] [txnScope=global] [error=EOF]

| username: wakaka | Original post link

Why did the later logs appear in the middle? Check the status of the tidb-4000 service with systemctl status to see if there are any errors. The logs don’t show the cause of the error. Is the tidb process running? It should automatically restart.

| username: 表渣渣渣 | Original post link

Check the service:
● tidb-4000.service - tidb service
Loaded: loaded (/etc/systemd/system/tidb-4000.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2022-10-12 17:31:18 CST; 13s ago
Process: 1147008 ExecStart=/bin/bash -c /data/tidb/tidb-deploy/tidb-4000/scripts/run_tidb.sh (code=exited, status=1/FAILURE)
Main PID: 1147008 (code=exited, status=1/FAILURE)

This service doesn’t seem to have any issues.

| username: Ming | Original post link

You can go to the scripts directory under the deploy directory of the corresponding TiDB node, enter it and manually start it to see what the error is.

| username: 表渣渣渣 | Original post link

xxxx this IP address is from our other server.

| username: srstack | Original post link

How about executing this?

Also, check the status of PD.

| username: wakaka | Original post link

This state is abnormal, not running. After ruling out issues like the firewall, no other critical logs can be seen. It is recommended to either scale down or scale up this TiDB node.

| username: 表渣渣渣 | Original post link

xxxx This IP is the address of our other server.

This is the result after manual execution.

| username: WalterWj | Original post link

Confirm the status of PD and check the logs. TiDB cannot start because it cannot access PD.

| username: 表渣渣渣 | Original post link

In the end, I had no choice but to take down the TiDB and add it again.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.