TiKV Availability

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV可用性

| username: TiDBer_pkQ5q1l0

【 TiDB Usage Environment】Production Environment
【 TiDB Version】v6.1.0
【Encountered Issue】
I would like to ask, if a TiKV cluster has 6 nodes and 2 TiKV nodes go down, will the cluster become unavailable? Furthermore, if there are 9 TiKV nodes and at least 2 TiKV nodes go down, will it also become unavailable? Can TiKV only tolerate the failure of at most 1 node at a time?

| username: h5n1 | Original post link

TiDB defaults to 3 replicas. Due to majority consensus, it requires that at most one replica can fail. The specific number of TiKV nodes that can fail also depends on the deployment.

For example:

  • With 6 TiKV nodes on 6 hosts: Under 3 replicas, only 1 TiKV node or host can fail without affecting usage.
  • With 6 TiKV nodes on 3 hosts, with 2 nodes per host: In such deployments, TiKV is usually labeled to indicate they are on the same host, ensuring that replicas of the same region are not on the same host. If one host fails, 2 TiKV nodes fail, but it does not affect availability.
| username: xiaohetao | Original post link

The availability of several KV clusters depends on your replica settings, not the number of nodes.

| username: xiaohetao | Original post link

As shown above: 1 2 3 4 5 6 represent 6 physical server nodes. Different colors inside represent a raft group, namely a b c d (L stands for leader, f stands for follower, aL is the leader of group a, f1 and f2 are the other 2 replicas).

Example of node down:
If 2 nodes are down
If 1 and 2 are down, the a replica cannot ensure consistency and majority election.
If 3 and 4 are down, this issue will not occur.

Also, because of majority consistency and election, the number of replica data is generally odd.

| username: xiaohetao | Original post link

The drawing is not very good, but that’s the idea. Hope you understand.

| username: alfred | Original post link

This depends on the specific deployment architecture. “If at least 2 out of 9 TiKV nodes fail, will it be unusable?” Are you planning to set up 9 replicas?

| username: TiDBer_pkQ5q1l0 | Original post link

Isn’t the default number of replicas three? Aren’t regions automatically distributed according to the number of TiKV nodes? If a region happens to have 2 replicas on the 2 nodes that are down, wouldn’t the cluster be down?
I currently have 6 TiKV hosts, with 1 TiKV node on each host.

| username: wuxiangdong | Original post link

This depends on how you configure your replicas. By default, there are 3 replicas, which is unrelated to the number of machines. After version 6.0, you can configure more replicas for key regions.

| username: xiaohetao | Original post link

Regions are automatically distributed. If the number of your nodes is greater than the number of replicas, then the regions in a Raft group will not be placed on the same node.

| username: xiaohetao | Original post link

Of course, what I mentioned above does not include the use of special configurations in the lab.

| username: xiaohetao | Original post link

According to your example, with 6 physical nodes and the default 3 replicas, it is impossible for 2 replicas of 1 region to be placed on the same node. If this were the case, it would not be possible to achieve majority consensus and election.

| username: TiDBer_pkQ5q1l0 | Original post link

My understanding is that if some regions happen to fall on the two faulty host nodes, the cluster will have problems. These regions are randomly distributed, which means that if two nodes in a cluster fail, the cluster is likely to become unavailable.

| username: xiaohetao | Original post link

Yes.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.