Is there a situation where multiple nodes reach the election time simultaneously during Raft leader election?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: raft leader选举时是否存在同时到达election time的情况?

| username: alfred

【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】
【Encountered Issues】
【Reproduction Path】What operations were performed that led to the issue
【Issue Phenomenon and Impact】
【Attachments】

  • Relevant logs, configuration files, Grafana monitoring (https://metricstool.pingcap.com/)
  • TiUP Cluster Display information
  • TiUP Cluster Edit config information
  • TiDB-Overview monitoring
  • Corresponding module’s Grafana monitoring (if any, such as BR, TiDB-binlog, TiCDC, etc.)
  • Corresponding module logs (including logs from 1 hour before and after the issue)

If the question is related to performance optimization or troubleshooting, please download and run the script. Be sure to select all and copy-paste the terminal output results for upload.

| username: ddhe9527 | Original post link

In the Rat paper, Section 5.2 describes:

Raft uses randomized election timeouts to ensure that split votes are rare and that they are resolved quickly. To prevent split votes in the first place, election timeouts are chosen randomly from a fixed interval (e.g., 150–300ms). This spreads out the servers so that in most cases only a single server will time out; it wins the election and sends heartbeats before any other servers time out. The same mechanism is used to handle split votes. Each candidate restarts its randomized election timeout at the start of an election, and it waits for that timeout to elapse before starting the next election; this reduces the likelihood of another split vote in the new election. Section 9.3 shows that this approach elects a leader rapidly.

In TiDB, the minimum election timeout is raft-min-election-timeout-ticks, and the maximum election timeout is raft-max-election-timeout-ticks. Within this interval, different nodes will randomly initiate elections to avoid the situation where multiple nodes initiate elections simultaneously and split the votes. However, this does not completely prevent the issue. If votes are split and a Leader cannot be elected, a new round of election voting will be initiated.

| username: alfred | Original post link

However, the region Leader elected in this way is random. Will the election of a suitable region Leader take into account the load situation of the TiKV nodes comprehensively?

| username: ddhe9527 | Original post link

During elections, the load is not considered. For example, if a TiKV node goes down, all Regions with Leaders on that TiKV node will independently conduct Raft Leader elections. These new Leaders are elected randomly, so they are evenly distributed across all other TiKV nodes. Subsequently, PD will schedule based on the load of TiKV nodes and Regions.

| username: alfred | Original post link

In other words, load scheduling is just the job of PD.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. No new replies are allowed.