Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIDB服务起不来,分区文件被锁
[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1.0
[Reproduction Path] /home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /data12/tidb/data/tikv-20162/ bad-regions
[Encountered Problem: Partition file is locked, causing TiDB startup failure
]
[Resource Configuration]
Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Is it TiDB or TiKV? It seems like there is an issue with TiKV. If that’s the case, use SHOW PROCESSLIST
to see which processes are currently executing. Provide more detailed information.
The cluster won’t start, unable to show.
– View the file or check the file through the process ID
lsof yourfile
lsof -p
Then perform the kill operation. Does this work?
Lock issue resolved, now reporting:
[2024/05/23 08:27:36.524 +08:00] [ERROR] [tidb.go:89] [“[ddl] init domain failed”] [error=“[tikv:9005]Region is unavailable”]
[2024/05/23 08:27:44.024 +08:00] [INFO] [tidb.go:74] [“new domain”] [store=tikv-7160479475851883215] [“ddl lease”=45s] [“stats lease”=3s] [“index usage sync lease”=0s]
The KV is down, causing the error: Region is unavailable.
Two TiKV nodes are offline, but there are still 10 functioning normally.
How did you take them offline? If the regions are not completely cleaned up, some replicas will remain on these two TiKV nodes, which will cause issues with that portion of the data.
If two of the three replicas of a region happen to be on these two TiKV nodes, then it’s not normal.
Use tiup cluster scale-in tidb-JBDP --node 10.114.26.112:20161
When using tikv-ctl in local mode, you need to stop the tikv instance. The reason your tidb is not starting could be due to an abnormal region status, which might be related to your scaling down. Seek help from the official support.
Is there an error in the TiKV file system, or was the lock file manually modified?
It seems that the offline node has encountered an anomaly, and the REGION balancing has not been completed.