After one of the three PDs in the cluster fails, the remaining two PDs still support read and write operations, right? How do they elect a leader?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 3个PD的集群坏掉一个PD后,剩下两个PD还支持读写吧,它们是怎么选leader的?

| username: 江湖故人

Why is there no split-brain issue?

| username: 江湖故人 | Original post link

There shouldn’t be any split-brain issue probably because TiDB is CP. I don’t understand the process of selecting a Leader.

| username: xfworld | Original post link

You can look up the Raft leader election process.

The remaining two PDs can support read and write operations, but to be on the safe side, it’s best to have three nodes.

Having an odd number of nodes is a basic requirement… So there won’t be a split-brain scenario…

| username: changpeng75 | Original post link

The Leader Election Process in Raft

When to Initiate an Election

At the start of the cluster, all servers are followers. When a server does not receive a valid message from the leader or candidate within a specified time, it initiates an election. This specified time is called the election timeout, which is a random value (e.g., 200ms-500ms). What constitutes a valid message? A valid message from the leader is a heartbeat message, and from a candidate, it is a vote request message. Two state variables are introduced here: election timeout and heartbeat time interval (the interval at which the leader sends heartbeat messages). It is required that the heartbeat time interval << min(election timeout) to avoid followers from initiating unnecessary votes.

Voting Process

  1. The follower increments its term.

  2. The follower changes its state to candidate.

  3. The candidate votes for itself.

  4. The candidate sends a vote request (RequestVote) to other machines in the cluster.

  5. The candidate ends its state under the following conditions:

    1. If more than half of the servers in the cluster agree, the candidate becomes the leader and immediately sends heartbeat messages to all servers, followed by periodic heartbeat messages. In any term, each server can only vote once. If all candidates in this term have voted for themselves, additional followers’ votes are needed to win the election.
    2. If another leader is discovered and its term is not less than the candidate’s term, the candidate reverts to a follower. Otherwise, the message is discarded.
    3. If no server wins the election, possibly due to network timeout or server issues, no leader is elected. This situation is simple to handle: retry after a timeout. A situation called split votes can occur, for example, in a cluster of three servers where all servers initiate an election simultaneously, making it impossible to elect a leader. If all servers retry simultaneously after a timeout, no leader will ever be elected. Raft handles this by using the aforementioned random election timeout, ensuring that the probability of split votes is very low.

When Followers Agree

If the vote request contains a term greater than or equal to the current term and the log information is not older than the candidate’s log information, the follower will agree. Log-related information will be discussed in log replication.

How Terms are Updated

All recipients of requests and responses must update their term upon receiving a larger term. This ensures that a leader can eventually be elected.

| username: yiduoyunQ | Original post link

Why is there no split-brain issue?

Because with 3 nodes, 2 remaining nodes can only elect a leader with a majority of 2 votes, which is defined by the protocol to ensure that a split-brain scenario is impossible.

How do they elect a leader?

You can refer to previous articles for the general logic:

| username: 春风十里 | Original post link

When there are two PDs left, it can still read and write normally. However, if one more fails, leaving only one, it can no longer read or write, and the entire cluster will go down.

| username: hey-hoho | Original post link

Here’s an animation to see how Raft elects a leader:
https://raft.github.io/raftscope/index.html

It can simulate node failures.

| username: YuchongXU | Original post link

Raft protocol

| username: WinterLiu | Original post link

These are the advantages of the RAFT protocol.

| username: xingzhenxiang | Original post link

The Raft protocol is also used for elections in Redis Cluster.

| username: 双开门变频冰箱 | Original post link

Only when the two can no longer communicate should they split.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.