Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIKV可用性
【 TiDB Usage Environment】Production Environment
【 TiDB Version】v6.1.0
【Encountered Issue】
I would like to ask, if a TiKV cluster has 6 nodes and 2 TiKV nodes go down, will the cluster become unavailable? Furthermore, if there are 9 TiKV nodes and at least 2 TiKV nodes go down, will it also become unavailable? Can TiKV only tolerate the failure of at most 1 node at a time?
TiDB defaults to 3 replicas. Due to majority consensus, it requires that at most one replica can fail. The specific number of TiKV nodes that can fail also depends on the deployment.
For example:
- With 6 TiKV nodes on 6 hosts: Under 3 replicas, only 1 TiKV node or host can fail without affecting usage.
- With 6 TiKV nodes on 3 hosts, with 2 nodes per host: In such deployments, TiKV is usually labeled to indicate they are on the same host, ensuring that replicas of the same region are not on the same host. If one host fails, 2 TiKV nodes fail, but it does not affect availability.
The availability of several KV clusters depends on your replica settings, not the number of nodes.
As shown above: 1 2 3 4 5 6 represent 6 physical server nodes. Different colors inside represent a raft group, namely a b c d (L stands for leader, f stands for follower, aL is the leader of group a, f1 and f2 are the other 2 replicas).
Example of node down:
If 2 nodes are down
If 1 and 2 are down, the a replica cannot ensure consistency and majority election.
If 3 and 4 are down, this issue will not occur.
Also, because of majority consistency and election, the number of replica data is generally odd.
The drawing is not very good, but that’s the idea. Hope you understand.
This depends on the specific deployment architecture. “If at least 2 out of 9 TiKV nodes fail, will it be unusable?” Are you planning to set up 9 replicas?
Isn’t the default number of replicas three? Aren’t regions automatically distributed according to the number of TiKV nodes? If a region happens to have 2 replicas on the 2 nodes that are down, wouldn’t the cluster be down?
I currently have 6 TiKV hosts, with 1 TiKV node on each host.
This depends on how you configure your replicas. By default, there are 3 replicas, which is unrelated to the number of machines. After version 6.0, you can configure more replicas for key regions.
Regions are automatically distributed. If the number of your nodes is greater than the number of replicas, then the regions in a Raft group will not be placed on the same node.
Of course, what I mentioned above does not include the use of special configurations in the lab.
According to your example, with 6 physical nodes and the default 3 replicas, it is impossible for 2 replicas of 1 region to be placed on the same node. If this were the case, it would not be possible to achieve majority consensus and election.
My understanding is that if some regions happen to fall on the two faulty host nodes, the cluster will have problems. These regions are randomly distributed, which means that if two nodes in a cluster fail, the cluster is likely to become unavailable.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.