TiKV fails to start after crash v6.1.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tikv 宕机后无法启动 v6.1.0 。

| username: anly_zhang

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] V6.1.0
[Encountered Issue] TIKV crashed and cannot be started
[Reproduction Path] Checked the logs of the tikv node, only found errors, but no specific reason for the crash
[Issue Phenomenon and Impact]
A tikv node in the running tidb cluster crashed and cannot start the service to join the cluster.
The error occurred around 19:54 on 2022-10-12 and was logged in the tikv_stderr log. There was no impact on the business, but after discovering it, the node could not be started manually, and the specific reason for the crash was not analyzed. tikv_stderr.log (24.9 KB)
[Attachment]
https://mega.nz/folder/1g1mxSqa#SMbj_Qew8Ao8xx5LCGDqCQ

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: Meditator | Original post link

Does the tidb user have permission to access the file?

| username: anly_zhang | Original post link

With the necessary permissions, the deployment was done using the tidb user, and the directory has been authorized for tidb.

| username: zhouzeru | Original post link

Are there enough resources?

| username: undefined | Original post link

It is possible that the mounted disk has been disconnected.

| username: anly_zhang | Original post link

After checking the monitoring, the usage of each resource was not high at that time.

| username: WalterWj | Original post link

Try manually logging in as the tidb user and manually touching a file on the corresponding data disk to see if you have the necessary permissions :thinking:. If that doesn’t work, consider scaling up or down.

| username: anly_zhang | Original post link

It is possible to touch the file normally, and there seems to be no issue with permissions.

Are you asking if you need to expand the disk for scaling?

| username: WalterWj | Original post link

Scale in or out this TiKV node instance.

| username: anly_zhang | Original post link

The scaling out has already been done through tiup cluster scale-out.

Additionally, I have a question. The paths for deploy_dir and data_dir are /data/tidb-deploy and /data/tidb-data. I can touch files in the tidb-deploy and tidb-data directories and their subdirectories using the tidb account, but I don’t have permission in the parent directory /data. Could this be the reason for the issue?

| username: TiDBer_muzijiang | Original post link

It appears to be a permissions issue based on the logs. Try granting the necessary permissions to see if that resolves the problem.