Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: Tikv 宕机后无法启动 v6.1.0 。
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] V6.1.0
[Encountered Issue] TIKV crashed and cannot be started
[Reproduction Path] Checked the logs of the tikv node, only found errors, but no specific reason for the crash
[Issue Phenomenon and Impact]
A tikv node in the running tidb cluster crashed and cannot start the service to join the cluster.
The error occurred around 19:54 on 2022-10-12 and was logged in the tikv_stderr log. There was no impact on the business, but after discovering it, the node could not be started manually, and the specific reason for the crash was not analyzed. tikv_stderr.log (24.9 KB)
[Attachment]
https://mega.nz/folder/1g1mxSqa#SMbj_Qew8Ao8xx5LCGDqCQ
Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.
Does the tidb user have permission to access the file?
With the necessary permissions, the deployment was done using the tidb user, and the directory has been authorized for tidb.
Are there enough resources?
It is possible that the mounted disk has been disconnected.
After checking the monitoring, the usage of each resource was not high at that time.
Try manually logging in as the tidb user and manually touching a file on the corresponding data disk to see if you have the necessary permissions . If that doesn’t work, consider scaling up or down.
It is possible to touch the file normally, and there seems to be no issue with permissions.
Are you asking if you need to expand the disk for scaling?
Scale in or out this TiKV node instance.
The scaling out has already been done through tiup cluster scale-out.
Additionally, I have a question. The paths for deploy_dir and data_dir are /data/tidb-deploy and /data/tidb-data. I can touch files in the tidb-deploy and tidb-data directories and their subdirectories using the tidb account, but I don’t have permission in the parent directory /data. Could this be the reason for the issue?
It appears to be a permissions issue based on the logs. Try granting the necessary permissions to see if that resolves the problem.