TiKV node is inaccessible

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv节点无法访问

| username: TiDBer_RQobNXGv

After restarting, the failure information is as follows:
The TiKV node log is as follows:

| username: 裤衩儿飞上天 | Original post link

Take a look at the logs. I see that there are two TiKV nodes on one server, and the other one is fine. Most likely, this node has crashed.

| username: TiDBer_RQobNXGv | Original post link

How do I restart a single node? Please help.

| username: 裤衩儿飞上天 | Original post link

  1. You can first briefly check the logs for any errors to see why the node went down.

  2. If you just want to start the node and then check for any errors, use the start command:

    tiup cluster start XXXXXX -N ip:port
    

    For more details, you can refer to tiup cluster help.


PS: Rushing might make things worse~

| username: 像风一样的男子 | Original post link

Generally, nodes will automatically restart. If it doesn’t come up, you need to check the TiKV logs for any errors.

| username: TiDBer_RQobNXGv | Original post link

I have added the error information, please take a look.

| username: TiDBer_RQobNXGv | Original post link

I have added the node error information, please take a look.

| username: 裤衩儿飞上天 | Original post link

First, check if your disk-related mounts are all normal. If everything is normal, then check if it’s a bug. I can’t access GitHub right now.

| username: 像风一样的男子 | Original post link

It is most likely a disk issue, it looks like the SST file is corrupted. Check under this KV:

| username: TiDBer_jYQINSnf | Original post link

The error you’re encountering is due to the options file being missing. How could the file be lost? Disk damage? Try copying an Options file from another directory, usually named db/OPTIONS-xxx. This is the startup configuration for RocksDB, so it should generally be the same for all TiKV instances. If that doesn’t work, you might need to physically destroy and rebuild it.

| username: tidb菜鸟一只 | Original post link

It looks like the file is corrupted. How about trying to scale down and then scale up?

| username: 这里介绍不了我 | Original post link

Learn a bit.

| username: TIDB-Learner | Original post link

Did you mount two hard drives or just one? It feels like the former, and I suspect one of the drives has an issue. Check with fdisk -l.

| username: dba远航 | Original post link

There is an issue with the IO system.

| username: wangccsy | Original post link

The node is in use, right?

| username: zhh_912 | Original post link

It is necessary to analyze the logs to determine the cause of the error.

| username: zhanggame1 | Original post link

The log reports an I/O error, most likely the hard drive has failed.

| username: lemonade010 | Original post link

The disk is most likely having issues.

| username: redgame | Original post link

Scaling in and out is now possible.

| username: cassblanca | Original post link

The disk mounted on the /data2 directory of the 192.168.35.52 server has issues, causing file corruption and inability to read the configuration properly.