Due to a network failure, TiDB split from one cluster into two smaller clusters. Without considering data consistency, can both smaller clusters still provide services?

translator_bot · June 22, 2024, 1:12pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB因网络故障，由一个集群变成了两个小集群，在不考虑数据的一致性的情况下，两个小集群都可以能提供服务吗

| username: TiDBer_2i1SqvUB

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v6.0.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
In the CAP theorem, our TiDB prioritizes CP, which means that when a network failure occurs, a cluster may split into two smaller clusters that do not communicate with each other. To maintain data consistency, only one of them provides services externally. Is it possible to have both clusters provide services without considering data consistency issues after the network is restored?
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

translator_bot · June 22, 2024, 1:12pm

| username: Kongdom | Original post link

There is only one Leader in PD. The cluster where the Leader is located will provide services externally. However, your statement has confused me. If there are two cabinets, each with 3 TiDB, 3 PD, and 3 TiKV, and the two cabinets are not connected to each other, does that mean there are two clusters? It should be well explained when PD has an odd number of nodes.

translator_bot · June 22, 2024, 1:12pm

| username: tidb狂热爱好者 | Original post link

The idea is too bold. If there are 3 machines with 3 replicas, theoretically it is possible. Each cluster can independently become a cluster, and it can also become two sets of clusters with 1 TKV as a cluster. There are also ways to achieve this.

translator_bot · June 22, 2024, 1:12pm

| username: tidb狂热爱好者 | Original post link

It is incorrect to say that one cluster can be turned into three clusters.

translator_bot · June 22, 2024, 1:12pm

| username: WalterWj | Original post link

The background is incorrect; network isolation will not turn one cluster into two clusters.

translator_bot · June 22, 2024, 1:12pm

| username: Jellybean | Original post link

Therefore, a good practice is to require the cluster to only allow an odd number of PD nodes and not an even number. When network isolation occurs, resulting in two small clusters, only one of them will have a majority. If the isolation results in three or more small clusters, the PD nodes will stop serving externally because none of them will have a majority.

translator_bot · June 22, 2024, 1:12pm

| username: Kongdom | Original post link

Is there a hard limit on the number of PDs? I always thought it was recommended to have an odd number. Previously, in a project, one node went down, leaving only 2 PD nodes. However, having an even number of nodes does indeed cause issues. At that time, frequent switching occurred, with each node only voting for itself.

translator_bot · June 22, 2024, 1:12pm

| username: CuteRay | Original post link

Practice yields truth, you can try it out. This is not a simple cell division. When one cell splits, both small cells provide complete functions. In short, it depends on the situation. If the two separated small clusters meet the minimum architecture requirements for deploying a TiDB cluster, it is possible to achieve, but the data might be incomplete.

translator_bot · June 22, 2024, 1:12pm

| username: Jellybean | Original post link

That’s not what I meant. The official guidelines do not impose strict limits; they are merely suggested values. It just means that when we deploy, we should follow best practices, which can usually help avoid many issues.

translator_bot · June 22, 2024, 1:12pm

| username: ealam_小羽 | Original post link

My understanding is that CP sacrifices some availability to prevent partition issues caused by network failures, right?

translator_bot · June 22, 2024, 1:12pm

| username: xfworld | Original post link

Brainstorming～

Are you planning to split PD? Then change from 3 PD, 2 TiDB, 3 TiKV to two sets of 1 PD, 1 TiDB, 1 TiKV?

translator_bot · June 22, 2024, 1:12pm

| username: TiDBer_jYQINSnf | Original post link

What you’re describing is a network partition scenario. The purpose of having 3 replicas on the TiKV side is to handle this issue. No matter how the network is partitioned, one side will have 1 replica and the other side will have 2 replicas.

On the side with 1 replica, data cannot be committed because it cannot be confirmed by other nodes during writes. The side with 2 replicas can commit data normally. Therefore, only the side with 2 replicas can provide normal service.

If the side with 1 replica could write data, it would be rolled back according to the Raft protocol once the network is restored.

The Raft protocol is designed to handle this situation. It is not possible to have both sides provide service simultaneously. At least with the current TiKV architecture, this cannot be achieved.

translator_bot · June 22, 2024, 1:12pm

| username: Kongdom | Original post link

If the minority side redeploys a set of tiup to manage, and the configuration file only includes the nodes of the minority side, will this trick the cluster into successfully starting?

translator_bot · June 22, 2024, 1:12pm

| username: TiDBer_jYQINSnf | Original post link

TiKV has a local raft CF, which records the local state of the region, including the number of replicas for each region, who the peers are, and the store ID corresponding to each peer. Can’t fool me

translator_bot · June 22, 2024, 1:12pm

| username: TiDBer_2i1SqvUB | Original post link

Thank you for the reminder; here’s the situation: currently, we want to deploy TiDB across multiple cross-domain data centers, which increases the likelihood of network failures. Suppose there are originally seven PD nodes, and a network partition results in two smaller clusters, with 4 and 3 PD nodes respectively. The cluster with 4 PD nodes will provide external services. Will the cluster with 3 PD nodes have the possibility of re-electing a leader node since it is isolated from the other cluster? Should we reset the total number of PD nodes?

translator_bot · June 22, 2024, 1:12pm

| username: TiDBer_2i1SqvUB | Original post link

Then can we use the original tiup to add a new tikv node on the minority side, increase the number of region replicas, let the minority side re-elect, and update the local raft cf?

translator_bot · June 22, 2024, 1:12pm

| username: TiDBer_jYQINSnf | Original post link

The newly added TiKV has no regions, so peers need to be added. The task of adding peers is assigned by PD to the leader TiKV. If the minority regions cannot elect a leader, they naturally cannot receive the add peer command issued by PD. Therefore, replicas cannot be added, and the newly added empty TiKV will remain empty. Replicas cannot be supplemented.

The above explanation is based on the logic without modifying the code. If the code is modified, anything is possible, and it can be implemented in any way desired.

translator_bot · June 22, 2024, 1:12pm

| username: xingzhenxiang | Original post link

A worm can still live if you cut it in half, haha.

translator_bot · June 22, 2024, 1:12pm

| username: TiDBer_2i1SqvUB | Original post link

Alright, thank you very much; my topic is the combination of computing power network and TiDB. Since the computing power network is composed of multiple data centers and the network has a significant impact, I want to achieve that each partition can provide services after network partitioning. Then, after the network is restored, I want to see if it is possible to achieve eventual data consistency.

translator_bot · June 22, 2024, 1:12pm

| username: xingzhenxiang | Original post link

I have conducted a network cable disconnection test. When most nodes have issues, the cluster becomes unavailable. However, once all the network cables are plugged back in, the cluster becomes available again.