Consultation on TiKV Replica Supplement Mechanism

I have a question:
With 3 TiKV instances and regions using 3 replicas, utilizing labels for host-level binding, if one TiKV instance goes down and cannot recover within the max-store-down-time setting, what will happen next?

The peer scheduler in PD starts initiating scheduling and adding replicas on other hosts.

Normally, after exceeding the parameter value, replicas are added to other TiKV nodes. However, if you have 3 replicas and 3 TiKV nodes, and one node fails, there is no place to add the replica, so it will remain in an offline state. You need to add another node; the number of surviving TiKV nodes must be greater than or equal to the number of replicas.

What does “offline” mean? If a third TiKV does not come online for a long time, will it affect the normal operation of the cluster?

It’s offline, not available, it’s an intermediate state. In this state, TiKV will perform balancing. If one out of three nodes goes down, it won’t affect the cluster’s operation, but it should be replenished in a timely manner to avoid another one going down.

If there are 3 TiKV nodes and one of them fails, it will not affect normal operations.
Offline: When a TiKV Store is manually taken offline through PD Control, the Store will enter the Offline state. This state is just an intermediate state for the Store going offline. In this state, the Store will move all its Regions to other Up state Stores that meet the relocation conditions.

