I want to ask how to handle this situation during config change

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 想问问configchange的时候,这种情况该怎么处理

| username: wzharies

When performing a config change and adding a new node 5, node 5 is successfully created but due to a network partition, it remains unaware of the existence of other nodes and thus becomes the leader and writes a no-op. In another partition, nodes 1, 2, 3, and 4 have node 1, 2, and 3 already removed, leaving only node 4. When the partitions merge, node 4 becomes aware of node 5’s existence, but node 4 cannot successfully send a vote request to node 5 because node 4’s LogTerm is smaller.

As a result, the entire cluster experiences a cycle where node 4’s requestVote is rejected, it increases its Term, and sends it to node 5. Node 5 steps down, then re-elects itself as the leader. Node 4’s requestVote is rejected again, and this cycle continues.

If a new peer is added and the leader steps down or no leader is elected, the new node is likely to time out and initiate an election, leading to its own election as the leader.

| username: wzharies | Original post link

The current solution is that the newly created peer from the config change is not allowed to initiate an election.

| username: 赖沐曦_bullfrog | Original post link

Our solution is: New nodes with empty PRs should not be allowed to become candidates or leaders.

| username: wzharies | Original post link

Yes, there are quite a few bugs related to this. Another bug discovered is that when a newly created peer is applying a snapshot, it should not call regionRange.Delete if it needs to update the region.

| username: TiDBer_gB1c0idL | Original post link

Encountered the same problem, seeking guidance.

| username: wzharies | Original post link

When a new peer is created during a config change, it does not have information about other nodes. Before the peer obtains information about other nodes, it should not initiate an election.