What is the election mechanism when the PD leader's term expires, and is there a high possibility for the original PD leader to continue serving as the leader?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd leader 如果任期到了,选举机制是怎么样的,原先的pd leader 是否有较大的可能继续充当leader

| username: Raymond

May I ask, teachers, if the PD leader’s term expires, is there a high possibility that the original PD leader will continue to serve as the leader? What is the election process like at this time? I am studying some materials.

| username: WalterWj | Original post link

The PD leader generally does not switch; it will only re-elect if it gets stuck or the duration is high.

| username: Raymond | Original post link

Isn’t it said that the Raft protocol has some kind of term? When the term ends, a new election must be triggered.

| username: Raymond | Original post link

I think it can be understood this way: the leader, in order to maintain its position, periodically sends heartbeats to the followers. When the followers receive the leader’s heartbeat information, they reset their election timeout, which means the followers will not initiate an election.

| username: huhaifeng | Original post link

If you’re interested, you can study the Raft algorithm. There are resources available in Chinese, and you can find them online (for example, here: GitHub - maemual/raft-zh_cn: Raft一致性算法论文的中文翻译). The election trigger condition here is caused by the tick heartbeat timeout you mentioned, and it is random (to ensure that elections are not initiated simultaneously). A follower will initiate an election if it times out, and if the election is successful, the old leader will naturally no longer be the leader.

| username: ljluestc | Original post link

PD (Placement Driver) is responsible for managing metadata and cluster coordination. If the PD leader’s term expires, it will trigger the Raft Leader Election mechanism to elect a new PD leader.

During the Raft Leader Election, the existing PD nodes participate in the election process. Each node has the opportunity to become the new leader. The Raft consensus algorithm ensures that only one node becomes the leader. The leader is responsible for accepting client requests, handling metadata changes, and coordinating cluster operations.

Whether the original PD leader is more likely to continue as the leader depends on the specific implementation of the Raft algorithm and the circumstances during the election. Raft is designed to provide strong consistency and fault tolerance, but it does not favor any particular node during the leader election process.

The leader election considers factors such as the availability and reliability of nodes, their communication latency, and the state of the Raft log. The node that meets the Raft algorithm’s requirements and reaches consensus among the participating nodes will become the new leader.

It is important to note that Raft is designed to handle failures and ensure a consistent view of the cluster. Therefore, even if the original PD leader had a good term in the past, it does not guarantee that it will continue as the leader in the next term. The leader election process aims to select the most suitable node as the new leader based on the current state of the cluster.

| username: redgame | Original post link

The original PD leader is highly likely to continue serving as the new leader. Yes.

| username: Anna | Original post link

Unless the new leader scores higher than the old leader, there won’t be a random replacement.

| username: 南征北战 | Original post link

Unless the heartbeat is lost, an election to choose a new leader will not be initiated.

| username: TiDBer_jYQINSnf | Original post link

That’s how it is. As long as the master and slave are not disconnected, a re-election will not be initiated (except for forced transfer).

| username: zhanggame1 | Original post link

Under normal circumstances, the leader will not change.

| username: undefined | Original post link

The concept of a leader’s term does not exist. To put it simply, in Raft, when a candidate successfully gets elected as a leader, it will send heartbeats to all followers and other members who were previously candidates to maintain its leadership position. When does a leader switch occur?

  1. Network isolation: If the leader is isolated from the majority and the majority cannot receive heartbeats, a new election will be triggered. This is when the so-called term changes.
  2. Leader failure: Commonly, this happens when the leader crashes, similar to the situation above.
| username: Anna | Original post link

There is no concept of term expiration.

| username: Anna | Original post link

5.2 Leader Election

Raft uses a heartbeat mechanism to trigger leader elections. When server programs start, they all begin as followers. A server node remains in the follower state as long as it receives valid RPCs from the leader or candidates. The leader periodically sends heartbeat messages (i.e., AppendEntries RPCs without log entries) to all followers to maintain its authority. If a follower does not receive any messages for a period of time, known as election timeout, it assumes there is no available leader in the system and initiates an election to select a new leader.

To start an election, a follower first increments its current term number and transitions to the candidate state. It then sends RequestVote RPCs to other server nodes in the cluster in parallel to solicit votes for itself. The candidate remains in this state until one of three things happens: (a) it wins the election, (b) another server becomes the leader, or (c) a period of time passes without any candidate winning. These outcomes are discussed in the following paragraphs.

When a candidate receives votes from a majority of the servers in the cluster for the same term, it wins the election and becomes the leader. Each server can cast at most one vote per term, on a first-come, first-served basis (note: Section 5.4 adds some additional restrictions on voting). The requirement for a majority of votes ensures that at most one candidate can win the election (election safety in Figure 3). Once a candidate wins the election, it immediately becomes the leader. It then sends heartbeat messages to other servers to establish its authority and prevent new elections from being initiated.

While waiting for votes, a candidate may receive an AppendEntries RPC from another server claiming to be the leader. If the term number in this RPC is at least as large as the candidate’s current term number, the candidate recognizes the leader as legitimate and reverts to the follower state. If the term number in the RPC is smaller, the candidate rejects the RPC and remains in the candidate state.

The third possible outcome is that the candidate neither wins nor loses the election: if multiple followers become candidates simultaneously, votes may be split such that no candidate can gain a majority. When this happens, each candidate will timeout and start a new election by incrementing the current term number. However, without other mechanisms, votes could be split indefinitely.

The Raft algorithm uses random election timeouts to ensure that vote splitting is rare and quickly resolved when it does occur. To prevent initial vote splitting, the election timeout is randomly chosen from a fixed interval (e.g., 150-300 milliseconds). This spreads out the servers so that in most cases only one server will timeout and win the election, sending heartbeat messages before other servers timeout. The same mechanism is used in the case of vote splitting. Each candidate resets a random election timeout when starting an election and waits for the result within this timeout, reducing the likelihood of repeated vote splitting in new elections. Section 9.3 demonstrates that this approach can quickly elect a leader.

The example of leader election illustrates how the principle of understandability guides our design. Initially, we planned to use a ranking system: each candidate would be assigned a unique rank to be used when competing with other candidates. If a candidate discovered another candidate with a higher rank, it would revert to the follower state, making it easier for the higher-ranked candidate to win the next election. However, we found this method had some availability issues (if the higher-ranked server crashed, lower-ranked servers might timeout and re-enter the candidate state, potentially resetting the entire election process if this behavior occurred quickly enough). We made several adjustments to the algorithm, but each adjustment introduced new problems. Ultimately, we found that the random retry method was more straightforward and easier to understand.