TiDB fails to start when deploying a single machine and starting the cluster

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 单机部署,启动集群的时候,tidb无法启动

| username: TiDBer_lDlxBO2I

[TiDB Usage Environment] Single machine deployment
[TiDB Version] v6.5.1
[Reproduction Path]
tiup cluster deploy tidb-test v6.5.1 ./topo.yaml --user root -p Normal
tiup cluster start tidb-test --init Error
[Encountered Problem: Phenomenon and Impact]
Cluster startup error, TiDB cannot start:
Error: failed to start tidb: failed to start: tidb-4000.service, please check the instance’s log(/tidb-deploy/tidb-4000/log) for more detail.: timed out waiting for port 4000 to be started after 2m0s
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
Error message screenshot:

Cluster status:

I have uninstalled and reinstalled several times, but the same error occurs. There is no port conflict, and the firewall and SELinux are turned off, but the cluster still cannot start. Have any experts encountered the same problem? Please provide some guidance.

| username: caiyfc | Original post link

Take a look at the error in the operating system log.

| username: Kongdom | Original post link

Please send the logs under tidb-4000/log.

| username: TiDBer_lDlxBO2I | Original post link

This is the tidb.log log after I restarted.

| username: TiDBer_lDlxBO2I | Original post link

Hello, this is the log file under tidb-4000.

| username: 魔礼养羊 | Original post link

From your logs, it appears that your tb server’s bootstrap failed to start, it didn’t start at all.

Do you have monitoring on your server? How were the CPU and memory usage rates during the cluster startup process?

I previously deployed on a single machine at Kingsoft Cloud, and 4 cores and 8GB couldn’t start.

The recommended performance is as follows:

Component CPU Memory Local Storage Network Minimum Instance Count
TiDB 8 cores+ 16 GB+ No special requirements Gigabit NIC 1 (can be on the same machine as PD)
PD 4 cores+ 8 GB+ SAS, 200 GB+ Gigabit NIC 1 (can be on the same machine as TiDB)
TiKV 8 cores+ 32 GB+ SSD, 200 GB+ Gigabit NIC 3
TiFlash 32 cores+ 64 GB+ SSD, 200 GB+ Gigabit NIC 1
TiCDC 8 cores+ 16 GB+ SAS, 200 GB+ Gigabit NIC 1
| username: TiDBer_lDlxBO2I | Original post link

Well, currently I can’t actually simulate a cluster with such a large configuration because I don’t have cloud servers or real physical servers here. I’m using VMware virtual machines for deployment, with an 8-core 8GB configuration. When I check the load during startup, the CPU is not fully utilized. I’ll try upgrading the virtual machine configuration to see if the error persists. The main issue is that it’s quite difficult to find relevant error information online. Seeking help here, I found that other users also encountered the issue of the tidb-4000 port failing at this step during startup. However, their issues were related to timestamp problems, which prevented startup. The problems are similar but not identical.

| username: TiDBer_lDlxBO2I | Original post link

The pre-started PD and TiKV have been launched, but the TiDB server failed to start, so the subsequent monitoring components also terminated and did not start successfully for monitoring.

| username: tidb菜鸟一只 | Original post link

Check the machine’s resource load; it should be a resource shortage.

| username: TiDBer_lDlxBO2I | Original post link

This is the load when I start TiDB alone. From this, it can be seen that my virtual machine is allocated 8 cores and 12G. Currently, there is still 4G of memory left, and none of the CPU cores are fully utilized.

| username: TiDBer_lDlxBO2I | Original post link

This is the running status of the cluster components, and all other components are running normally.

| username: CuteRay | Original post link

Check the virtual machine disk information; it looks like the bootstrap failed due to insufficient disk space.

By the way, for single-machine deployment, it’s best to use tiup playground to start a single-machine cluster.

| username: TiDBer_lDlxBO2I | Original post link

As for the disk, the deployment directory currently has 1.7G of space left. I’ll try creating a new virtual machine with a larger disk.

| username: Running | Original post link

If the hardware is insufficient, you can try using TiCloud for testing.

| username: xingzhenxiang | Original post link

Let’s briefly understand tiup playground.

| username: CuteRay | Original post link

Well, make the disk larger, it should be at least 50G.