During the election time, even if the region Leader itself fails (e.g., due to a network partition), it still acts as the region Leader. At this time, data reading is done via Lease read, i.e., local read. So, will data writes to this region Leader still succeed? How does it coordinate with other region nodes?
For questions related to performance optimization or troubleshooting, please download and run the script. Be sure to select all and copy-paste the terminal output results for upload.
As long as the number of nodes is odd, it will be fine. If there is an even number of nodes, there may be incomplete replicas, which can cause the cluster to malfunction.
TiKV handles split-brain scenarios primarily by relying on the Raft protocol. During the election time, even if the region Leader itself fails (such as a network partition), it still remains the region Leader. At this time, data is read using Lease read, i.e., local read. This method ensures linear consistency, meaning that if we write a value at a certain point in time, any read after that point will definitely read that value and not any value from before that point [*].
Additionally, if a split-brain occurs and the client requests reach the minority cluster, it will not receive an Ack. Upon retrying the request, if it reaches the majority cluster, it will receive an Ack. When the network recovers, the minority cluster will automatically become Followers [*].
If more than half of the PD nodes are damaged, we can directly refer to the scenario where all nodes are damaged, or handle it as a split-brain scenario. This is because if more than half of the nodes are damaged, the cluster cannot elect a leader. Alternatively, we can start a single node separately and then scale in or out the other nodes according to the split-brain example [*].