After successfully deploying the TiDB cluster, it gets stuck during startup and reports an error after a long time

translator_bot · June 21, 2024, 5:16pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb集群部署成功后，启动时卡住，过了很久报错了。

| username: 哈喽沃德

[Test Environment for TiDB] Testing
[TiDB Version] 7.4.0
[Encountered Issue: Phenomenon and Impact] Unable to start the cluster
[Resource Configuration] Single machine deployment with pseudo-cluster
[Attachments: Screenshots/Logs/Monitoring]

tiup-cluster-debug-2023-10-24-12-28-12.log (60.3 KB)

translator_bot · June 21, 2024, 5:16pm

| username: 啦啦啦啦啦 | Original post link

According to the prompt, check the logs of the corresponding TiKV node.

translator_bot · June 21, 2024, 5:16pm

| username: 哈喽沃德 | Original post link

There is no content in the tikv_stderr.log file under the corresponding tikv node. The content of tikv.log is as follows:
tikv.log (206.2 KB)

translator_bot · June 21, 2024, 5:16pm

| username: tidb菜鸟一只 | Original post link

For single-machine deployment, check if the host resources are insufficient. Use tiup cluster display clustername to see if PD has already crashed…

translator_bot · June 21, 2024, 5:16pm

| username: zhanggame1 | Original post link

For a single-machine deployment, you need at least 12GB of memory. I’m not sure if you’ve allocated enough. If the memory is insufficient, do not deploy TiFlash, and just deploy one TiKV.

translator_bot · June 21, 2024, 5:16pm

| username: Kongdom | Original post link

There is a 10-minute timeout period. If there is no response within 10 minutes, it is considered a failure. This situation may occur when there are insufficient resources for a single-machine deployment, but in reality, it has started successfully. You still need to check the logs under the corresponding TiKV node to confirm. If there are no logs, it is simply a slow startup, so please wait a bit longer.

translator_bot · June 21, 2024, 5:16pm

| username: 路在何chu | Original post link

Please post the TiKV logs.

translator_bot · June 21, 2024, 5:16pm

| username: 像风一样的男子 | Original post link

Single-node deployment means each component has one node. A single replica for KV is sufficient.

translator_bot · June 21, 2024, 5:16pm

| username: Fly-bird | Original post link

The KV service didn’t start, try starting it manually.

translator_bot · June 21, 2024, 5:16pm

| username: Kongdom | Original post link

It seems that TiKV has a welcome message, which indicates it has started. Use the display command to check the cluster status.

translator_bot · June 21, 2024, 5:16pm

| username: 哈喽沃德 | Original post link

Re-executed and it’s no longer stuck at TiKV, but now it’s stuck at the following. Could it be that starting the cluster is related to the network? Accessing from the local machine.

translator_bot · June 21, 2024, 5:16pm

| username: 像风一样的男子 | Original post link

Monitor the server resource status.

translator_bot · June 21, 2024, 5:16pm

| username: 哈喽沃德 | Original post link

Is the disk sufficient, but the memory not enough?

translator_bot · June 21, 2024, 5:16pm

| username: 像风一样的男子 | Original post link

Bro, you’re planning to set up a TiDB with 4GB of RAM? It won’t even start. Even laptops now come with 16GB of RAM.

translator_bot · June 21, 2024, 5:16pm

| username: 哈喽沃德 | Original post link

[root@hdty-dmdca log]# tiup cluster start tidb-cluster
tiup is checking updates for component cluster …
Starting component cluster: /data/components/cluster/v1.13.1/tiup-cluster start tidb-cluster
Starting cluster tidb-cluster…

[ Serial ] - SSHKeySet: privateKey=/data/storage/cluster/clusters/tidb-cluster/ssh/id_rsa, publicKey=/data/storage/cluster/clusters/tidb-cluster/ssh/id_rsa.pub
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[Parallel] - UserSSH: user=tidb, host=172.16.60.94
[ Serial ] - StartCluster
Starting component pd
Starting instance 172.16.60.94:2379
Start instance 172.16.60.94:2379 success
Starting component tikv
Starting instance 172.16.60.94:20160
Starting instance 172.16.60.94:20162
Starting instance 172.16.60.94:20161
Start instance 172.16.60.94:20160 success
Start instance 172.16.60.94:20161 success
Start instance 172.16.60.94:20162 success
Starting component tidb
Starting instance 172.16.60.94:4000
Start instance 172.16.60.94:4000 success
Starting component tiflash
Starting instance 172.16.60.94:9000
Start instance 172.16.60.94:9000 success
Starting component prometheus
Starting instance 172.16.60.94:9090
Start instance 172.16.60.94:9090 success
Starting component grafana
Starting instance 172.16.60.94:3000
Start instance 172.16.60.94:3000 success
Starting component node_exporter
Starting instance 172.16.60.94
Start 172.16.60.94 success
Starting component blackbox_exporter
Starting instance 172.16.60.94
Start 172.16.60.94 success
[ Serial ] - UpdateTopology: cluster=tidb-cluster
Started cluster tidb-cluster successfully
[root@hdty-dmdca log]#

translator_bot · June 21, 2024, 5:16pm

| username: 哈喽沃德 | Original post link

After running it a few more times, it actually started up.

translator_bot · June 21, 2024, 5:16pm

| username: tidb菜鸟一只 | Original post link

With 4GB of memory, check the memory usage after you start up. I estimate that you will soon be unable to connect to the machine. Once the memory is used up, you won’t even be able to connect remotely.

translator_bot · June 21, 2024, 5:16pm

| username: 哈喽沃德 | Original post link

[root@hdty-dmdca log]# free -m
total used free shared buff/cache available
Mem: 4675 3904 125 50 645 437
Swap: 8191 6571 1620
[root@hdty-dmdca log]#

translator_bot · June 21, 2024, 5:16pm

| username: 随缘天空 | Original post link

The configuration is too low. I encountered the same problem before. You can refer to the following link: 快速上手TiDB--在单机上模拟部署生产环境集群--启动失败 - #21，来自有猫万事足 - TiDB 的问答社区

translator_bot · June 21, 2024, 5:16pm

| username: 哈喽沃德 | Original post link

There is still 600M available, and 6G of swap has been used.