Urgent - TiKV Unable to Start

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 救急-tikv启动不了

| username: rw12306

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
v5.4.0
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]

Two TiKV nodes suddenly couldn’t start.

The TiKV error is as follows:

| username: xfworld | Original post link

Are there more details in the logs? It looks like it crashed.


Check what this file describes.

| username: rw12306 | Original post link

The log only contains this information.
The panic_mark_file file is empty, there’s nothing in it.

log.log (24.3 KB)

| username: tidb菜鸟一只 | Original post link

Is the space full? Two out of three TiKVs are down, so it might be a bit difficult. Restarting is possible, but it’s unlikely that data won’t be lost. 专栏 - TiDB集群恢复之TiKV集群不可用 | TiDB 社区.

| username: rw12306 | Original post link

The space is not full, only about 10% is used.

| username: songxuecheng | Original post link

First, temporarily move the panic_mark_file to another location. Then try restarting.

| username: rw12306 | Original post link

This won’t work. The file reappears when starting up, and the same error is reported.

| username: songxuecheng | Original post link

Then check the system files to see if there are any anomalies in /var/log/message
Then check dmesg.

| username: rw12306 | Original post link

The /var/log/messages log is as follows:

The dmesg log is as follows:

| username: ohammer | Original post link

Waiting for the expert to share the solution. If it doesn’t work, use the universal method of restarting.

| username: rw12306 | Original post link

Can’t restart.

| username: rw12306 | Original post link

I used the tikv-ctl tool to analyze it, and no error messages were returned.

| username: h5n1 | Original post link

If resources are sufficient, first scale out 2 TiKV, then perform multiple replica failure recovery.

| username: rw12306 | Original post link

Will data be lost during this kind of scaling now?

| username: rw12306 | Original post link

What is the main reason for this? It suddenly stopped working.

| username: h5n1 | Original post link

Install clinic: tiup install diag, then collect the information before the downtime and upload it for the official team to review.

| username: Billmay表妹 | Original post link

Please refer to this and upload it~

Also, remember to give viewing permissions to the friends who replied to you.

Copy and paste the complete error as well.

| username: rw12306 | Original post link

Is it possible without external internet?

| username: h5n1 | Original post link

Download the offline package, set the path with tiup mirror set.

| username: xfworld | Original post link

Isn’t there a node that’s still alive? Can you recover data from that node?

First, try expanding the resources of the TiKV node.