TiKV with three replicas: If there are only three TiKV machines and one goes down, how will the three replicas operate?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv三副本,加入只有三台tikv机器,宕掉一台,三个副本会怎么操作?

| username: 特雷西-迈克-格雷迪

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Encountered Problem]
[Reproduction Path] Default three replicas, and only three TiKV instances. If one TiKV instance goes down, will the cluster still function normally?
[Problem Phenomenon and Impact]

[Attachment]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: xfworld | Original post link

It’s not normal, you must immediately add TiKV nodes to restore to the state of three instances. After the region replenishment and scheduling are completed, the state will be restored.

In general, the production environment will use a deployment mode of 5 nodes and 5 replicas.

| username: 特雷西-迈克-格雷迪 | Original post link

With 5 replicas, if one TiKV instance goes down, the entire cluster becomes unusable?

| username: Raymond | Original post link

That definitely won’t happen. With 5 replicas, it can tolerate 2 replica failures.

| username: 啦啦啦啦啦 | Original post link

With TiKV’s three replicas on three TiKV machines, if one machine goes down, there will only be 2 replicas left. The cluster can still function normally at this point, but if another machine goes down, it will no longer be able to provide normal service. Therefore, it is necessary to add another KV node.

| username: cs58_dba | Original post link

A triangle is the most stable.

| username: gary | Original post link

The leader of the crashed node switches and rebalances to the remaining two machines, allowing the cluster to function normally.

| username: alfred | Original post link

That means as long as most of the KV nodes are alive, the cluster can still function normally.

| username: 啦啦啦啦啦 | Original post link

Not necessarily. This refers to a three-node setup. If there are more nodes, a three-replica cluster with two KV nodes down could also lead to anomalies.

| username: Kongdom | Original post link

Under normal circumstances, with three replicas and three TiKV instances, if one TiKV instance goes down, the cluster remains operational.

| username: 张雨齐0720 | Original post link

The majority rule principle states that as long as more than half are alive, service can be provided.

| username: Lystorm | Original post link

If one node goes down, TiKV can read normally, but writing will be abnormal.

| username: 特雷西-迈克-格雷迪 | Original post link

Does anyone have production experience? Why does everyone say something different?

| username: h5n1 | Original post link

For 3 replicas, a maximum of 1 TiKV is allowed to fail; for 5 replicas, a maximum of 2 TiKVs are allowed to fail. More strictly speaking, it allows 1 or 2 replicas of a region to fail. When the number of failed TiKVs hosting a region exceeds half, the region cannot provide service.

| username: TiDBer_muzijiang | Original post link

If one out of three instances goes down, it can still function normally.

| username: alfred | Original post link

It can work normally, and as long as the majority of the regions are alive, it can still function. However, QPS and TPS will be temporarily affected when a failure occurs. Additionally, as long as at least one copy of the region is alive, data can be recovered.

So, does the TiDB database design require at least 3 replicas for each region?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.