Basic-tidb-0 Keeps Failing to Start: CrashLoopBackOff

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: basic-tidb-0 一直启动失败 :CrashLoopBackOff

| username: TiDB-CXD

[TiDB Usage Environment] Production Environment / Testing / Poc
kind Testing Environment
[TiDB Version]
6.5
[Reproduction Path] What operations were performed to cause the issue
Followed the official tutorial to set up the cluster using kind
[Encountered Issue: Issue Phenomenon and Impact]
root@ubuntu:/home/cxd# kubectl get pods -n tidb-cluster
NAME READY STATUS RESTARTS AGE
basic-discovery-5fbdb874d8-9btwp 1/1 Running 5 14h
basic-monitor-0 4/4 Running 16 14h
basic-pd-0 1/1 Running 4 14h
basic-tidb-0 1/2 CrashLoopBackOff 16 12h
basic-tidb-dashboard-0 1/1 Running 4 14h
basic-tikv-0 1/1 Running 4 13h

[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
[terror.go:300] [“unexpected error”] [error=“path "/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b" is not a descendant of mount point root "/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”] [stack=“github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

| username: TiDB-CXD | Original post link

First time using it, cloud newbie here, hoping for some advice from the experts.

| username: TiDB-CXD | Original post link

I wanted to install and get it running. It took me more than a day, and now I’m at this step. It feels like I’m just one step away.

| username: TiDB-CXD | Original post link

Stuck for a long time, haven’t found a good solution, hope everyone can give some suggestions.

| username: TiDB-CXD | Original post link

Or give some ideas, and I will investigate in that direction.

| username: TiDB-CXD | Original post link

This is a kind test scenario, and I am using a VM virtual machine.

| username: TiDB-CXD | Original post link

Host
|—|—|
|Processor|11th Gen Intel(R) Core™ i5-1135G7 @ 2.40GHz 2.42 GHz|
|Installed RAM|16.0 GB (15.7 GB usable)|
|System type|64-bit operating system, x64-based processor|

| username: yiduoyunQ | Original post link

Describe to confirm the error, Kubernetes 上的 TiDB 常见部署错误 | PingCAP 文档中心

| username: TiDB-CXD | Original post link

Events:
Type Reason Age From Message


Normal Scheduled 12h default-scheduler Successfully assigned tidb-cluster/basic-tidb-0 to kind-control-plane
Normal Pulled 12h kubelet Container image “alpine:3.16.0” already present on machine
Normal Created 12h kubelet Created container slowlog
Normal Started 12h kubelet Started container slowlog
Normal Pulled 12h (x2 over 12h) kubelet Container image “uhub.service.ucloud.cn/pingcap/tidb:v6.5.0” already present on machine
Normal Created 12h (x2 over 12h) kubelet Created container tidb
Normal Started 12h (x2 over 12h) kubelet Started container tidb
Warning BackOff 12h (x3 over 12h) kubelet Back-off restarting failed container
Normal SandboxChanged 12h kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 12h kubelet Container image “alpine:3.16.0” already present on machine
Normal Created 12h kubelet Created container slowlog
Normal Started 12h kubelet Started container slowlog
Normal Pulled 12h (x3 over 12h) kubelet Container image “uhub.service.ucloud.cn/pingcap/tidb:v6.5.0” already present on machine
Normal Created 12h (x3 over 12h) kubelet Created container tidb
Normal Started 12h (x3 over 12h) kubelet Started container tidb
Warning BackOff 12h (x6 over 12h) kubelet Back-off restarting failed container
Normal SandboxChanged 40m kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 40m kubelet Container image “alpine:3.16.0” already present on machine
Normal Created 40m kubelet Created container slowlog
Normal Started 40m kubelet Started container slowlog
Normal Pulled 39m (x4 over 40m) kubelet Container image “uhub.service.ucloud.cn/pingcap/tidb:v6.5.0” already present on machine
Normal Created 39m (x4 over 40m) kubelet Created container tidb
Normal Started 39m (x4 over 40m) kubelet Started container tidb
Warning BackOff 16s (x185 over 40m) kubelet Back-off restarting failed container
root@ubuntu:/home/cxd#

| username: TiDB-CXD | Original post link

The above is the execution result of kubectl describe pod basic-tidb-0 -n tidb-cluster.

| username: TiDB-CXD | Original post link

Execution result of kubectl logs -n tidb-cluster -f basic-tidb-0 -c tidb:

start tidb-server ...
/tidb-server --store=tikv --advertise-address=basic-tidb-0.basic-tidb-peer.tidb-cluster.svc --host=0.0.0.0 --path=basic-pd:2379 --config=/etc/tidb/tidb.toml --log-slow-query=/var/log/tidb/slowlog
[2023/04/24 01:34:12.357 +00:00] [INFO] [cpuprofile.go:113] ["parallel cpu profiler started"]
[2023/04/24 01:34:12.357 +00:00] [FATAL] [terror.go:300] ["unexpected error"] [error="path \"/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b\" is not a descendant of mount point root \"/docker/2e800829730d53f792c9ac0b32a64ff153094e9b7df0208d2bc9a14d31f4526b/kubelet\" and cannot be exposed from \"/sys/fs/cgroup/rdma/kubelet\""] [stack="github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"] [stack="github.com/pingcap/tidb/parser/terror.MustNil\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:300\nmain.setGlobalVars\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:615\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:208\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
| username: TiDB-CXD | Original post link

The VM provided is 10G 8-core.

| username: dba-kit | Original post link

It seems to be a directory permission issue. What is the deployment configuration file?

| username: TiDB-CXD | Original post link

Um… I followed the steps on the official website:

| username: yiduoyunQ | Original post link

Are there any special configurations for cgroup? Some features of TiDB require reading the memory information of the node first.

| username: TiDB-CXD | Original post link

No configuration has been done, it’s a newly installed virtual machine.

| username: TiDB-CXD | Original post link

#subsys_name hierarchy num_cgroups enabled
cpuset 9 64 1
cpu 6 182 1
cpuacct 6 182 1
blkio 5 182 1
memory 7 276 1
devices 4 183 1
freezer 11 65 1
net_cls 2 64 1
perf_event 3 64 1
net_prio 2 64 1
hugetlb 12 64 1
pids 8 187 1
rdma 13 6 1
misc 10 1 1
| username: dba-kit | Original post link

Try creating a Docker container directly in the virtual machine to see if there are any errors. I suspect it might be because you are running Docker in a virtual machine, whereas the official documentation runs Docker directly on the local machine.

| username: TiDB-CXD | Original post link

Does this prove it?
root@ubuntu:/home/cxd# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest feb5d9fea6a5 19 months ago 13.3kB
kindest/node 094599011731 2 years ago 1.17GB
root@ubuntu:/home/cxd# docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

  1. The Docker client contacted the Docker daemon.
  2. The Docker daemon pulled the “hello-world” image from the Docker Hub.
    (amd64)
  3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
  4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
Getting started | Docker Docs

root@ubuntu:/home/cxd#

| username: yiduoyunQ | Original post link

The error comes from tidb/tidb-server/main.go at v6.5.0 · pingcap/tidb · GitHub. It looks like this package is used to “Automatically set GOMAXPROCS to match Linux container CPU quota.” I guess it is still caused by the VM environment.