Failed to Decommission TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv下线失败

| username: 奋斗的大象

[2024/05/22 14:17:59.223 +08:00] [FATAL] [server.rs:428] [“panic_mark_file /data11/tidb/data/tikv-20161/panic_mark_file exists, there must be something wrong with the db. Do not remove the panic_mark_file and force the TiKV node to restart. Please contact TiKV maintainers to investigate the issue. If needed, use scale in and scale out to replace the TiKV node. Scale a TiDB Cluster Using TiUP | PingCAP Docs”]

| username: caiyfc | Original post link

Generally, this prompt indicates that TiKV has panicked. Locate the panic_mark_file to check the specific error. You can look for related issues on GitHub. If it’s not needed, delete this file and then restart TiKV.

| username: TiDBer_q2eTrp5h | Original post link

I’ve also encountered a similar issue.

| username: TiDBer_q2eTrp5h | Original post link

Just restart TiKV, it should be fine. I remember not doing anything at that time. It mysteriously got fixed after the restart.

| username: zhaokede | Original post link

I’m just worried that going offline directly will result in data loss.

| username: TIDB-Learner | Original post link

This operation is quite clear, it is not recommended to delete. To avoid data loss, it is suggested to expand first and then shrink the problematic node.

| username: caiyfc | Original post link

First, you definitely need to check the issue to see if it’s a known bug, understand the cause, and find out which version has the fix. If it’s not a known issue, it needs to be reported. Simply scaling up or down might lead to encountering the same issue again next time. As for deleting the panic_mark_file, I think it’s highly likely that even if you delete it, TiKV still won’t start, but you might see more detailed error information.