Simulating the deployment of a production environment cluster on a single machine, unable to start after successful deployment

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 在单机上模拟部署生产环境集群,部署成功后启动不了

| username: zhimadi

[TiDB Usage Environment] Test
[TiDB Version] v5.4.2
[Reproduction Path] Deploy a new test cluster using tiup
[Encountered Problem: Issue Phenomenon and Impact]
After successful deployment, the cluster cannot start. Reference link: Simulate Deployment of Production Environment Cluster on a Single Machine
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Simulate deployment of a production environment cluster on a single machine. After successful deployment, it cannot start. 1pd, 1kv, 1tidb
Using configuration file: tiup cluster template > topology.yaml

Cluster test-cluster deployed successfully, you can start it with command: tiup cluster start test-cluster --init
[test-deploy@localhost ~]$ tiup cluster start test-cluster --init

Error after executing the command tiup cluster start test-cluster --init:
Error: failed to start tidb: failed to start: 10.0.0.56 tidb-4000.service, please check the instance’s log(/data/tidb/deploy/tidb-4000/log) for more detail.: timed out waiting for port 4000 to be started after 2m0s

Detailed tidb_stderr.log log:
{“level”:“warn”,“ts”:“2023-04-25T17:52:05.278+0800”,“caller”:“clientv3/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-0c043b83-12c8-450e-91e8-546ff1efdc93/10.0.0.56:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = context deadline exceeded”}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x28148d7]

goroutine 1 [running]:
github.com/pingcap/tidb/ddl.(*ddl).close(0xc000bff260)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl.go:399 +0x77
github.com/pingcap/tidb/ddl.(*ddl).Stop(0xc000bff260, 0x0, 0x0)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl.go:327 +0x8a
github.com/pingcap/tidb/domain.(*Domain).Close(0xc0007fbe00)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/domain/domain.go:695 +0x377
github.com/pingcap/tidb/session.(*domainMap).Get.func1(0xc0013bf1a0, 0xc00133f8d8, 0x1369afc)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/tidb.go:86 +0x69e
github.com/pingcap/tidb/util.RunWithRetry(0x1e, 0x1f4, 0xc00189f938, 0x18, 0x6468280)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/misc.go:65 +0x7f
github.com/pingcap/tidb/session.(*domainMap).Get(0x642b450, 0x4538850, 0xc0006fbbd0, 0xc0007fbe00, 0x0, 0x0)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/tidb.go:71 +0x1f0
github.com/pingcap/tidb/session.createSessionWithOpt(0x4538850, 0xc0006fbbd0, 0x0, 0x0, 0xc10a068d909177ba, 0x34ef6bc)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2767 +0x59
github.com/pingcap/tidb/session.createSession(...)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2763
github.com/pingcap/tidb/session.runInBootstrapSession(0x4538850, 0xc0006fbbd0, 0x4097f48)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2746 +0x59
github.com/pingcap/tidb/session.BootstrapSession(0x4538850, 0xc0006fbbd0, 0x0, 0x0, 0x0)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2593 +0xfef
main.createStoreAndDomain(0x64312a0, 0x3ff6a97, 0x2c)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:296 +0x189
main.main()
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/tidb-server/main.go:202 +0x29e

| username: Billmay表妹 | Original post link

What system and what resource configuration?

Did you refer to the official configuration requirements? TiDB 软件和硬件环境建议配置 | PingCAP 文档中心

| username: zhimadi | Original post link

44 cores, 128GB of memory.

| username: Hacker_ufuLjDKs | Original post link

Having money is good.

| username: 我是咖啡哥 | Original post link

It looks like a memory allocation address conflict?

Version 5.4.3 mentions a bug fix, not sure if it’s related? Try switching to a different version?

  • Fixed an issue where executing SHOW WARNINGS might report invalid memory address or nil pointer dereference #31569
| username: zhimadi | Original post link

There is only one single machine deployment. :joy: Bought it to test TiFlash. No money, only bought two second-hand ones, one mixed with PD, KV, TiDB, and the other with TiFlash.

| username: zhimadi | Original post link

Our production environment is using this version. :sob:

| username: tidb菜鸟一只 | Original post link

This is generally a Go error triggered by an invalid memory address or nil pointer dereference.

| username: 我是咖啡哥 | Original post link

Use the check command to see if there are any configurations in the environment that do not meet the requirements.

| username: zhimadi | Original post link

Additionally, other clusters occasionally encounter Go errors during maintenance, without any clear reason.

| username: zhimadi | Original post link

It must pass.

| username: 逍遥_猫 | Original post link

OP, how did you finally solve it? What was the reason?

| username: zhimadi | Original post link

I don’t know what the error is. Just try a few more times and it should be fine. I set up a test environment :sweat_smile:

| username: zhanggame1 | Original post link

I can’t figure out if reinstalling the system and rebuilding it is the fastest.