Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiKV下线节点,但是一直下线中,另外一个端口服务器存活,但是监控提示不在线
【TiDB Usage Environment】Production Environment
【TiDB Version】
【Reproduction Path】View on the monitoring page
【Encountered Problem: TiKV node is offline, but it remains in the offline state. Another port server is alive, but the monitoring indicates it is offline.】
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshot/Logs/Monitoring】
Check the service status of the node to see if it is alive…
The most direct way is to look at the logs.
First, back up the data, then add a new node to see if it works. The data from the 3 TiKV that went offline has nowhere to go.
If you have only 3 TiKV instances with the default 3 replicas and you want to remove one, it will likely remain pending.
You need to scale out first, then scale in to meet the minimum requirement of 3 replicas.
There’s no place for the data to go.
Try to find another machine and expand TiKV by one more node first.
Is it possible for an offline node to still have logical processing?
Check pd-ctl, if there are 3 replicas, then there are 3 TiKV nodes, it can’t be moved.
Check the logs of the problematic node.