If there are 3 TiKV nodes, does it mean that not a single TiKV can go down?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv3个节点,是不是1个tikv都不能宕掉

| username: TiDBer_Y2d2kiJh

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.0 2tidb 3pd 3tikv
[Reproduction Path] I would like to ask if one of the three tikv servers goes down, will the entire tidb cluster have issues with writing data? If we need to tolerate one tikv server going down, do we need four tikv nodes?
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]

| username: Billmay表妹 | Original post link

I remember that the 3-node setup is to ensure usability even in case of a crash.

So always keep 3 available TiKV nodes. If you know that a certain TiKV node is broken, you can replace the broken TiKV node with a good one by first scaling out and then scaling in.

| username: 啦啦啦啦啦 | Original post link

If there are 3 replicas, there won’t be any issues, but if one more machine goes down, it won’t be usable. Additionally, you need to note that after one of the three machines goes down, you need to add another machine before you can decommission the old one.

| username: Anna | Original post link

It can be used, but it needs to be fixed promptly.

| username: Timothy | Original post link

TiDB by default has three replicas: one leader and two followers. These replicas are distributed across different nodes. When a write operation is successful, it ensures that the data is complete on at least two nodes. Therefore, if any one node encounters an issue, data integrity is still guaranteed. TiDB will move the primary node of the region from the failed machine to the two surviving machines, allowing the cluster to continue functioning normally. However, the load on the remaining two TiKV nodes will increase since there are fewer machines.

It is recommended to regularly inspect the TiDB dashboard and Grafana monitoring, or set up alerts.

| username: Kongdom | Original post link

If there are 3 replicas evenly distributed, it is not a problem if one TiKV node goes down.

| username: zhanggame1 | Original post link

You can have one of the three TiKV nodes fail, but it must be detected and repaired as soon as possible. If another one fails, data will be lost.

| username: 孤君888 | Original post link

A distributed database definitely tolerates single-node failures.

| username: linnana | Original post link

In a production environment, it is generally not recommended to mix workloads, but it can be used.

| username: 我是咖啡哥 | Original post link

To be precise, there are 3 TiKV instances, and losing 1 instance is not a problem. If there are 3 servers, each deploying multiple instances without configuring label tags, and 2 replicas of the same data are on 2 instances of the same cluster, there will be issues if a server goes down.

| username: Hacker007 | Original post link

It can be used, but if you encounter any issues, you need to fix them quickly.

| username: Fly-bird | Original post link

You can and should only kill one.

| username: redgame | Original post link

It can be hung.

| username: 南征北战 | Original post link

It can be hung, but it must be fixed as soon as possible.
Using an odd number of nodes or an even number of nodes in the cluster may cause a “split-brain” issue, resulting in multiple leaders and data inconsistency.

| username: zhanggame1 | Original post link

I tried it, and the database still runs normally with three TiKV nodes down, but it stops working with two down.

| username: linnana | Original post link

TiKV also follows the majority rule principle.

| username: Anna | Original post link

You can.

| username: xfworld | Original post link

If possible, you can prepare 5 nodes, and you can choose between three replicas or five replicas…

This way, you won’t have to worry.

| username: zhanggame1 | Original post link

Having too many replicas can affect performance. The more replicas there are, the more modifications need to be persisted, leading to greater resource consumption.

| username: xfworld | Original post link

High availability and backup mechanisms require more resource support. Data modification requests and processing, as well as synchronization between nodes, are two different things.

As for the issue of resource consumption, it is a balance between business requirements, performance requirements, and disaster recovery requirements, which cannot be summarized in one sentence.