Several TiKV Nodes Disconnected After Pruning the Cluster

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.0.6
[Reproduction Path] After taking TiFlash offline and waiting for the replica count of the node to become 0 and enter the tombstone state, the tiup cluster prune operation was executed.
[Encountered Problem: Phenomenon and Impact] Several TiKV nodes experienced disconnection, reporting an error with a non-existent store ID.

[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

Let's see what node it is.
This doesn’t work.

The first pdctl command couldn’t retrieve anything and was killed, the second one also returned 0.

Do you dare to use the restart method? Restart the PD leader and switch to another node to see.

Is there any theoretical support for this? I also don’t see this store when accessing different PD APIs separately.

The image you provided is not accessible. Please provide the text you need translated.

This issue is quite complex and should be related to the problem supported last night:
TiKV disconnected and monitoring elevation issue: Currently, it is determined that TiKV disconnected because the TiKV raftstore was overwhelmed. The raftstore was overwhelmed because silent regions were activated, continuously sending requests to PD, causing the raftstore CPU to spike and lose connection (simultaneously, PD monitoring elevated). However, the reason why the silent regions were activated has not been confirmed yet. It is currently suspected that it might be a bug triggered by taking TiFlash offline. I will post a clear conclusion once we have one.

Thanks a lot, boss!

There are two issues with the cluster:

  1. Abnormal offline nodes in the cluster: The abnormal behavior is due to an unpatched bug in version 5.0.6. To completely avoid this bug, you need to upgrade the database version. Removed tombstone stores show again if transfer pd leader during scale in · Issue #4941 · tikv/pd · GitHub and PD client keeps reconnecting on error StoreTombstone · Issue #12506 · tikv/tikv · GitHub
  2. High PD monitoring metrics: This is caused by the previous two bugs, which continuously recognize the already tombstoned TiKV. The issue will be resolved after properly offlining the Store nodes.
