The TiDB node status is Down, and the Down TiDB node cannot be started

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb节点状态为 Down ,无法将 Down 的 tidb 节点启动起来

| username: withseid

Background
In a TiDB cluster with 3 TiDB nodes, one of the nodes went down. It is unclear if this was due to executing a delete operation without checking the amount of data being deleted. Later, it was found that the delete operation involved too much data, causing one of the TiDB nodes to go down.

After discovering that a TiDB node was down, I used tiup cluster restart --node ip:port to try to restart it, but the following error message was displayed:

Error: failed to start tidb: failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.: timed out waiting for port 14000 to be started after 2m0s

Verbose debug logs have been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2022-08-23-22-06-23.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.7.0/tiup-cluster` (wd:/home/tidb/.tiup/data/TFLI7w1) failed: exit status 1

Key information from the /home/tidb/.tiup/logs/tiup-cluster-debug-2022-08-23-22-06-23.log file:

2022-08-23T22:06:23.264+0800    DEBUG   retry error: operation timed out after 2m0s
2022-08-23T22:06:23.264+0800    DEBUG   TaskFinish      {"task": "StartCluster", "error": "failed to start tidb: failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.: timed out waiting for port 14000 to be started after 2m0s", "errorVerbose": "timed out waiting for port 14000 to be started after 2m0s\
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:115\
github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:147\
github.com/pingcap/tiup/pkg/cluster/operation.startInstance\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:359\
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:485\
golang.org/x/sync/errgroup.(*Group).Go.func1\
\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\
runtime.goexit\
\truntime/asm_amd64.s:1581\
failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.\
failed to start tidb"}
2022-08-23T22:06:23.265+0800    INFO    Execute command finished        {"code": 1, "error": "failed to start tidb: failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.: timed out waiting for port 14000 to be started after 2m0s", "errorVerbose": "timed out waiting for port 14000 to be started after 2m0s\
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:115\
github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:147\
github.com/pingcap/tiup/pkg/cluster/operation.startInstance\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:359\
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:485\
golang.org/x/sync/errgroup.(*Group).Go.func1\
\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\
runtime.goexit\
\truntime/asm_amd64.s:1581\
failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.\
failed to start tidb"}

Checking the TiDB log, the log is as follows:

:32.719 +08:00] [INFO] [domain.go:506] ["globalConfigSyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1187] ["PlanReplayerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:258] ["break campaign loop, context is done"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:933] ["loadPrivilegeInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1165] ["TelemetryRotateSubWindowLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:345] ["watcher is closed, no owner"] ["owner info"="[stats] ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53 watch owner key /tidb/stats/owner/161581faba041325"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:560] ["loadSchemaInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1061] ["globalBindHandleWorkerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:32.720 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [domain.go:452] ["topNSlowQueryLoop exited."]
[2022/08/23 19:27:33.826 +08:00] [INFO] [manager.go:277] ["failed to campaign"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [domain.go:1135] ["TelemetryReportLoop exited."]
[2022/08/23 19:27:33.971 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.971 +08:00

Since I couldn’t solve the restart issue, I decided to be more forceful and directly scale down the TiDB node that was down. After scaling down, I tried to scale up a new TiDB node, but encountered the following error during the scale-up process:

Error: executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.20.70.38:22' {ssh_stderr: , ssh_stdout: [2022/08/23 21:56:26.080 +08:00] [FATAL] [terror.go:292]["unexpected error"] [error="toml: cannot load TOML value of type string into a Go integer"] [stack="github.com/pingcap/tidb/parser/terror.MustNil\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:292\
github.com/pingcap/tidb/config.InitializeConfig\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/config/config.go:796\
main.main\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/tidb-server/main.go:177\
runtime.main\
\t/usr/local/go/src/runtime/proc.go:225"] [stack="github.com/pingcap/tidb/parser/terror.MustNil\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:292\
github.com/pingcap/tidb/config.InitializeConfig\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/config/config.go:796\
main.main\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/tidb-server/main.go:177\
runtime.main\
\t/usr/local/go/src/runtime/proc.go:225"]
, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /ssd/tidb-deploy/tidb-14000/bin/tidb-server --config-check --config=/ssd/tidb-deploy/tidb-14000/conf/tidb.toml }, cause: Process exited with status 1: check config failed

Verbose debug logs have been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2022-08-23-21-56-35.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.7.0/tiup-cluster` (wd:/home/tidb/.tiup/data/TFLG7Ha) failed: exit status 1

The current issue is that the TiDB node that went down cannot be restarted, and a new TiDB node cannot be scaled up.

| username: withseid | Original post link

Here is the additional log for the TiDB fault node, the previous post was missing some parts:

[2022/08/23 19:27:32.441 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="write tcp 10.20.70.39:14000->10.20.70.27:48664: write: connection reset by peer"] [stack="github.com/pingcap/tidb/parser/terror.Log\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\
github.com/pingcap/tidb/server.(*packetIO).writePacket\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:174\
github.com/pingcap/tidb/server.(*clientConn).writePacket\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:399\
github.com/pingcap/tidb/server.(*clientConn).writeError\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1446\
github.com/pingcap/tidb/server.(*clientConn).Run\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1078\
github.com/pingcap/tidb/server.(*Server).onConn\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:551"]
[2022/08/23 19:27:32.441 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="connection was bad"] [stack="github.com/pingcap/tidb/parser/terror.Log\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\
github.com/pingcap/tidb/server.(*clientConn).Run\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1079\
github.com/pingcap/tidb/server.(*Server).onConn\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:551"]
[2022/08/23 19:27:32.709 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.709 +08:00] [INFO] [manager.go:258] ["break campaign loop, context is done"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.712 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] []
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl_worker.go:149] ["[ddl] DDL worker closed"] [worker="worker 1, tp general"] ["take time"=1.058µs]
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl_worker.go:149] ["[ddl] DDL worker closed"] [worker="worker 2, tp add index"] ["take time"=354ns]
[2022/08/23 19:27:32.715 +08:00] [INFO] [delete_range.go:132] ["[ddl] closing delRange"]
[2022/08/23 19:27:32.715 +08:00] [INFO] [session_pool.go:86] ["[ddl] closing sessionPool"]
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl.go:417] ["[ddl] DDL closed"] [ID=4c6db34a-8534-43bb-aa8c-45d8a40f0a53] ["take time"=6.243619ms]
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl.go:328] ["[ddl] stop DDL"] [ID=4c6db34a-8534-43bb-aa8c-45d8a40f0a53]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:481] ["infoSyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:529] ["topologySyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1422] ["autoAnalyzeWorker exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:983] ["LoadSysVarCacheLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1293] ["loadStatsWorker exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:506] ["globalConfigSyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1187] ["PlanReplayerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:258] ["break campaign loop, context is done"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:933] ["loadPrivilegeInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1165] ["TelemetryRotateSubWindowLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:345] ["watcher is closed, no owner"] ["owner info"="[stats] ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53 watch owner key /tidb/stats/owner/161581faba041325"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:560] ["loadSchemaInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1061] ["globalBindHandleWorkerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:32.720 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [domain.go:452] ["topNSlowQueryLoop exited."]
[2022/08/23 19:27:33.826 +08:00] [INFO] [manager.go:277] ["failed to campaign"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [domain.go:1135] ["TelemetryReportLoop exited."]
[2022/08/23 19:27:33.971 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.971 +08:00] [INFO] [domain.go:1102] ["handleEvolvePlanTasksLoop exited."]
[2022/08/23 19:27:35.083 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpcerror: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:35.083 +08:00] [INFO] [domain.go:1369] ["updateStatsWorker exited."]
[2022/08/23 19:27:35.083 +08:00] [INFO] [domain.go:685] ["domain closed"] ["take time"=2.374434803s]
[2022/08/23 19:27:35.083 +08:00] [ERROR] [client.go:752] ["[pd] fetch pending tso requests error"] [dc-location=global] [error="[PD:client:ErrClientGetTSO]context canceled: contextcanceled"]
[2022/08/23 19:27:35.083 +08:00] [INFO] [gc_worker.go:230] ["[gc worker] quit"] [uuid=60aab57412c0007]
[2022/08/23 19:27:35.083 +08:00] [INFO] [client.go:666] ["[pd] exit tso dispatcher"] [dc-location=global]
| username: xfworld | Original post link

It looks like there is an error in the configuration file…
Check it more carefully.

| username: withseid | Original post link

The tidb.toml file on the machine to be expanded

# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
# server_configs:
#   tidb:
#     aa.b1.c3: value
#     aa.b2.c4: value
mem-quota-query = 4294967296

[log]
slow-threshold = "300i"

[performance]
txn-total-size-limit = 1073741824
| username: Kongdom | Original post link

Change it like this, with the root node commented out.

# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
server_configs:
tidb:
#     aa.b1.c3: value
#     aa.b2.c4: value
mem-quota-query = 4294967296

[log]
slow-threshold = "300i"

[performance]
txn-total-size-limit = 1073741824
| username: withseid | Original post link

Still not working, even after changing it, it gets overwritten. This tidb.toml is under the conf directory and is automatically generated during scaling. Previously, when scaling nodes, there was no need to manually modify the automatically generated files.

Here is the configuration of the entire cluster obtained through the tiup cluster edit-config cluster_name command:

global:
  user: tidb
  ssh_port: 22
  ssh_type: builtin
  deploy_dir: /data/tidb-deploy
  data_dir: /data/tidb-data
  os: linux
monitored:
  node_exporter_port: 19100
  blackbox_exporter_port: 19115
  deploy_dir: /data/tidb-deploy/monitored-19100
  data_dir: /data/tidb-data/monitored-19100
  log_dir: /data/tidb-deploy/monitored-19100/log
server_configs:
  tidb:
    log.slow-threshold: 300i
    mem-quota-query: 4294967296
    performance.txn-total-size-limit: 1073741824
  tikv:
    readpool.coprocessor.use-unified-pool: true
    readpool.storage.use-unified-pool: true
    readpool.unified.max-thread-count: 25
    storage.block-cache.capacity: 48G
  pd:
    replication.location-labels:
    - host
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
  tiflash: {}
  tiflash-learner: {}
  pump: {}
  drainer: {}
  cdc: {}
tidb_servers:
- host: 10.20.70.39
  ssh_port: 22
  port: 14000
  status_port: 10080
  deploy_dir: /ssd/tidb-deploy/tidb-14000
  log_dir: /ssd/tidb-deploy/tidb-14000/log
  numa_node: "1"
  arch: amd64
  os: linux
- host: 10.20.70.24
  ssh_port: 22
  port: 14000
  status_port: 10080
  deploy_dir: /ssd/tidb-deploy/tidb-14000
  log_dir: /ssd/tidb-deploy/tidb-14000/log
  arch: amd64
  os: linux
- host: 10.20.70.37
  ssh_port: 22
  port: 14000
  status_port: 10080
  deploy_dir: /ssd/tidb-deploy/tidb-14000
  log_dir: /ssd/tidb-deploy/tidb-14000/log
  numa_node: "1"
  arch: amd64
  os: linux
tikv_servers:
- host: 10.20.70.38
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  numa_node: "0"
  config:
    server.labels:
      host: tikv2
  arch: amd64
  os: linux
- host: 10.20.70.39
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  numa_node: "0"
  config:
    server.labels:
      host: tikv3
  arch: amd64
  os: linux
- host: 10.20.70.24
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  config:
    server.labels:
      host: tikv5
  arch: amd64
  os: linux
- host: 10.20.70.37
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  numa_node: "0"
  config:
    server.labels:
      host: tikv1
  arch: amd64
  os: linux
tiflash_servers: []
pd_servers:
- host: 10.20.70.38
  ssh_port: 22
  name: pd-10.20.70.38-12379
  client_port: 12379
  peer_port: 2380
  deploy_dir: /ssd/tidb-deploy/pd-12379
  data_dir: /ssd/tidb-data/pd-12379
  log_dir: /ssd/tidb-deploy/pd-12379/log
  numa_node: "1"
  arch: amd64
  os: linux
- host: 10.20.70.39
  ssh_port: 22
  name: pd-10.20.70.39-12379
  client_port: 12379
  peer_port: 2380
  deploy_dir: /ssd/tidb-deploy/pd-12379
  data_dir: /ssd/tidb-data/pd-12379
  log_dir: /ssd/tidb-deploy/pd-12379/log
  numa_node: "1"
  arch: amd64
  os: linux
- host: 10.20.70.37
  ssh_port: 22
  name: pd-10.20.70.37-12379
  client_port: 12379
  peer_port: 2380
  deploy_dir: /ssd/tidb-deploy/pd-12379
  data_dir: /ssd/tidb-data/pd-12379
  log_dir: /ssd/tidb-deploy/pd-12379/log
  numa_node: "1"
  arch: amd64
  os: linux
cdc_servers:
- host: 10.20.70.39
  ssh_port: 22
  port: 8300
  deploy_dir: /data/tidb-deploy/cdc-8300
  data_dir: /data/tidb-data/cdc-8300
  log_dir: /data/tidb-deploy/cdc-8300/log
  gc-ttl: 1209600
  arch: amd64
  os: linux
- host: 10.20.70.38
  ssh_port: 22
  port: 8300
  deploy_dir: /data/tidb-deploy/cdc-8300
  data_dir: /data/tidb-data/cdc-8300
  log_dir: /data/tidb-deploy/cdc-8300/log
  gc-ttl: 1209600
  arch: amd64
  os: linux
- host: 10.20.70.24
  ssh_port: 22
  port: 8300
  deploy_dir: /data/tidb-deploy/cdc-8300
  data_dir: /data/tidb-data/cdc-8300
  log_dir: /data/tidb-deploy/cdc-8300/log
  gc-ttl: 1209600
  arch: amd64
  os: linux
monitoring_servers:
- host: 10.20.70.36
  ssh_port: 22
  port: 9090
  deploy_dir: /data/tidb-deploy/prometheus-9090
  data_dir: /data/tidb-data/prometheus-9090
  log_dir: /data/tidb-deploy/prometheus-9090/log
  external_alertmanagers: []
  arch: amd64
  os: linux
grafana_servers:
- host: 10.20.70.36
  ssh_port: 22
  port: 3000
  deploy_dir: /data/tidb-deploy/grafana-3000
  arch: amd64
  os: linux
  username: admin
  password: admin
  anonymous_enable: false
  root_url: ""
  domain: ""
alertmanager_servers:
- host: 10.20.70.36
  ssh_port: 22
  web_port: 9093
  cluster_port: 9094
  deploy_dir: /data/tidb-deploy/alertmanager-9093
  data_dir: /data/tidb-data/alertmanager-9093
  log_dir: /data/tidb-deploy/alertmanager-9093/log
  arch: amd64
  os: linux
| username: withseid | Original post link

Solved it. It was caused by an extra ‘i’ added when modifying the config, specifically in log.slow-threshold: 300i.