After Node Offline, It Remains in Pending Offline Status and Does Not Disappear Even After 1 Day

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 节点下线后,处于Pending Offline无法消失,等了1天

username: LBX流鼻血

[TiDB Usage Environment] Production
[TiDB Version] 6.1.0
[Encountered Problem: Phenomenon and Impact] A node encountered an issue and after being taken offline, it remains in Pending Offline status. Upon checking the relevant information, the size is already zero, but the region_count is still 3499. This Pending Offline status just won’t disappear. How can I make it disappear?

username: h5n1

Region migration is slow, as long as it keeps decreasing, there will be no problem.

username: LBX流鼻血

This 3499 has been stuck for a very long time.

username: h5n1

Read the article and manually schedule first.

username: Kongdom

You can open the Grafana site to check the progress of pending offline. It is possible that the data is too large and has not been completed yet. I have experienced a situation where it was pending for a day. There are parameters that can be adjusted to speed it up.

username: tidb菜鸟一只

Without a leader, can we be bold and use --force to take it offline directly?

username: LBX流鼻血

@h5n1 @tidb newbie Yes, but we don’t dare to do this in production. The region_count is stuck at 3499, and we can’t manually schedule it. They are all empty.

username: Billmay表妹

Check out this article: 专栏 - tikv下线Pending Offline卡住排查思路 | TiDB 社区

username: h5n1

Region migration and Leader migration are two different things. Even if the leader is 0, you still have to wait for the region_count to be 0. What needs to be done now is to manually trigger the region migration action. You still need to refer to the previous documentation. The --force option only removes the node from tiup, but it won’t actually delete it from the real cluster. Don’t use it lightly.

username: redgame

There have been instances where the PD cluster’s abnormal status might cause nodes to fail to go offline properly.

username: LBX流鼻血

All values are 0, but the status is still Pending Offline. Additionally, I tried to check these two KV-SERVER processes, and they keep restarting infinitely: down, restart, down. The logs continuously output errors. Please help take a look, thank you very much.

username: Jellybean

Have you tried the method mentioned by h5n1 here? Check it out and see.

专栏 - TiKV缩容下线异常处理的三板斧 | TiDB 社区.

username: h5n1

It seems like there’s a separate thread for your TiKV restart issue. You can stop these two and use the online unsafe recover feature in version 6.1.

username: LBX流鼻血

Thank you, boss. Online unsafe recover is very useful.

username: system

