Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb集群部署成功后,启动时卡住,过了很久报错了。
[Test Environment for TiDB] Testing
[TiDB Version] 7.4.0
[Encountered Issue: Phenomenon and Impact] Unable to start the cluster
[Resource Configuration] Single machine deployment with pseudo-cluster
[Attachments: Screenshots/Logs/Monitoring]
tiup-cluster-debug-2023-10-24-12-28-12.log (60.3 KB)
According to the prompt, check the logs of the corresponding TiKV node.
There is no content in the tikv_stderr.log file under the corresponding tikv node. The content of tikv.log is as follows:
tikv.log (206.2 KB)
For single-machine deployment, check if the host resources are insufficient. Use tiup cluster display clustername
to see if PD has already crashed…
For a single-machine deployment, you need at least 12GB of memory. I’m not sure if you’ve allocated enough. If the memory is insufficient, do not deploy TiFlash, and just deploy one TiKV.
There is a 10-minute timeout period. If there is no response within 10 minutes, it is considered a failure. This situation may occur when there are insufficient resources for a single-machine deployment, but in reality, it has started successfully. You still need to check the logs under the corresponding TiKV node to confirm. If there are no logs, it is simply a slow startup, so please wait a bit longer.
Please post the TiKV logs.
Single-node deployment means each component has one node. A single replica for KV is sufficient.
The KV service didn’t start, try starting it manually.
It seems that TiKV has a welcome message, which indicates it has started. Use the display command to check the cluster status.
Re-executed and it’s no longer stuck at TiKV, but now it’s stuck at the following. Could it be that starting the cluster is related to the network? Accessing from the local machine.
Monitor the server resource status.
Is the disk sufficient, but the memory not enough?
Bro, you’re planning to set up a TiDB with 4GB of RAM? It won’t even start. Even laptops now come with 16GB of RAM.
[root@hdty-dmdca log]# tiup cluster start tidb-cluster
tiup is checking updates for component cluster …
Starting component cluster
: /data/components/cluster/v1.13.1/tiup-cluster start tidb-cluster
Starting cluster tidb-cluster…
- [ Serial ] - SSHKeySet: privateKey=/data/storage/cluster/clusters/tidb-cluster/ssh/id_rsa, publicKey=/data/storage/cluster/clusters/tidb-cluster/ssh/id_rsa.pub
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [Parallel] - UserSSH: user=tidb, host=172.16.60.94
- [ Serial ] - StartCluster
Starting component pd
Starting instance 172.16.60.94:2379
Start instance 172.16.60.94:2379 success
Starting component tikv
Starting instance 172.16.60.94:20160
Starting instance 172.16.60.94:20162
Starting instance 172.16.60.94:20161
Start instance 172.16.60.94:20160 success
Start instance 172.16.60.94:20161 success
Start instance 172.16.60.94:20162 success
Starting component tidb
Starting instance 172.16.60.94:4000
Start instance 172.16.60.94:4000 success
Starting component tiflash
Starting instance 172.16.60.94:9000
Start instance 172.16.60.94:9000 success
Starting component prometheus
Starting instance 172.16.60.94:9090
Start instance 172.16.60.94:9090 success
Starting component grafana
Starting instance 172.16.60.94:3000
Start instance 172.16.60.94:3000 success
Starting component node_exporter
Starting instance 172.16.60.94
Start 172.16.60.94 success
Starting component blackbox_exporter
Starting instance 172.16.60.94
Start 172.16.60.94 success
- [ Serial ] - UpdateTopology: cluster=tidb-cluster
Started cluster tidb-cluster
successfully
[root@hdty-dmdca log]#
After running it a few more times, it actually started up.
With 4GB of memory, check the memory usage after you start up. I estimate that you will soon be unable to connect to the machine. Once the memory is used up, you won’t even be able to connect remotely.
[root@hdty-dmdca log]# free -m
total used free shared buff/cache available
Mem: 4675 3904 125 50 645 437
Swap: 8191 6571 1620
[root@hdty-dmdca log]#
The configuration is too low. I encountered the same problem before. You can refer to the following link: 快速上手TiDB--在单机上模拟部署生产环境集群--启动失败 - #21,来自 有猫万事足 - TiDB 的问答社区
There is still 600M available, and 6G of swap has been used.