Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiDB因网络故障,由一个集群变成了两个小集群,在不考虑数据的一致性的情况下,两个小集群都可以能提供服务吗
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v6.0.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
In the CAP theorem, our TiDB prioritizes CP, which means that when a network failure occurs, a cluster may split into two smaller clusters that do not communicate with each other. To maintain data consistency, only one of them provides services externally. Is it possible to have both clusters provide services without considering data consistency issues after the network is restored?
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
There is only one Leader in PD. The cluster where the Leader is located will provide services externally. However, your statement has confused me. If there are two cabinets, each with 3 TiDB, 3 PD, and 3 TiKV, and the two cabinets are not connected to each other, does that mean there are two clusters? It should be well explained when PD has an odd number of nodes.
The idea is too bold. If there are 3 machines with 3 replicas, theoretically it is possible. Each cluster can independently become a cluster, and it can also become two sets of clusters with 1 TKV as a cluster. There are also ways to achieve this.
It is incorrect to say that one cluster can be turned into three clusters.
The background is incorrect; network isolation will not turn one cluster into two clusters.
Therefore, a good practice is to require the cluster to only allow an odd number of PD nodes and not an even number. When network isolation occurs, resulting in two small clusters, only one of them will have a majority. If the isolation results in three or more small clusters, the PD nodes will stop serving externally because none of them will have a majority.
Is there a hard limit on the number of PDs? I always thought it was recommended to have an odd number. Previously, in a project, one node went down, leaving only 2 PD nodes. However, having an even number of nodes does indeed cause issues. At that time, frequent switching occurred, with each node only voting for itself.
Practice yields truth, you can try it out. This is not a simple cell division. When one cell splits, both small cells provide complete functions. In short, it depends on the situation. If the two separated small clusters meet the minimum architecture requirements for deploying a TiDB cluster, it is possible to achieve, but the data might be incomplete.
That’s not what I meant. The official guidelines do not impose strict limits; they are merely suggested values. It just means that when we deploy, we should follow best practices, which can usually help avoid many issues.
My understanding is that CP sacrifices some availability to prevent partition issues caused by network failures, right?
Brainstorming~ 
Are you planning to split PD? Then change from 3 PD, 2 TiDB, 3 TiKV to two sets of 1 PD, 1 TiDB, 1 TiKV?
What you’re describing is a network partition scenario. The purpose of having 3 replicas on the TiKV side is to handle this issue. No matter how the network is partitioned, one side will have 1 replica and the other side will have 2 replicas.
On the side with 1 replica, data cannot be committed because it cannot be confirmed by other nodes during writes. The side with 2 replicas can commit data normally. Therefore, only the side with 2 replicas can provide normal service.
If the side with 1 replica could write data, it would be rolled back according to the Raft protocol once the network is restored.
The Raft protocol is designed to handle this situation. It is not possible to have both sides provide service simultaneously. At least with the current TiKV architecture, this cannot be achieved.
If the minority side redeploys a set of tiup to manage, and the configuration file only includes the nodes of the minority side, will this trick the cluster into successfully starting?
TiKV has a local raft CF, which records the local state of the region, including the number of replicas for each region, who the peers are, and the store ID corresponding to each peer. Can’t fool me 
Thank you for the reminder; here’s the situation: currently, we want to deploy TiDB across multiple cross-domain data centers, which increases the likelihood of network failures. Suppose there are originally seven PD nodes, and a network partition results in two smaller clusters, with 4 and 3 PD nodes respectively. The cluster with 4 PD nodes will provide external services. Will the cluster with 3 PD nodes have the possibility of re-electing a leader node since it is isolated from the other cluster? Should we reset the total number of PD nodes?
Then can we use the original tiup to add a new tikv node on the minority side, increase the number of region replicas, let the minority side re-elect, and update the local raft cf?
The newly added TiKV has no regions, so peers need to be added. The task of adding peers is assigned by PD to the leader TiKV. If the minority regions cannot elect a leader, they naturally cannot receive the add peer command issued by PD. Therefore, replicas cannot be added, and the newly added empty TiKV will remain empty. Replicas cannot be supplemented.
The above explanation is based on the logic without modifying the code. If the code is modified, anything is possible, and it can be implemented in any way desired. 
A worm can still live if you cut it in half, haha.
Alright, thank you very much; my topic is the combination of computing power network and TiDB. Since the computing power network is composed of multiple data centers and the network has a significant impact, I want to achieve that each partition can provide services after network partitioning. Then, after the network is restored, I want to see if it is possible to achieve eventual data consistency. 
I have conducted a network cable disconnection test. When most nodes have issues, the cluster becomes unavailable. However, once all the network cables are plugged back in, the cluster becomes available again.