Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TestOneSplit3B遇到request Timeout问题
After inspection, it was found that the issue was caused by network partitioning. One of the partitions kept timing out during elections, resulting in a large Term. When merging, this caused the Leader to revert to a Follower and restart the election. At the end of the test, the following request was made immediately after the partition merge:
req := NewRequest(left.GetId(), left.GetRegionEpoch(), []*raft_cmdpb.Request{NewGetCfCmd(engine_util.CfDefault, []byte("k2"))})
At this point, because the cluster was still in the election process, the request kept timing out. There is a small probability that the Leader could not be elected within the request timeout period, resulting in a nil response and an error.
Should we add preVote, or is there a better solution for this?
Let me share my solution:
Firstly, due to partitioning, if the leader is in a partition with more nodes, during merging, a re-election might occur because the follower’s term is larger, causing a timeout.
Therefore, when the leader receives a response, if it finds that the follower’s term is larger than its own, it should not immediately step down but ignore this message instead. The results of this approach are:
- If the follower has the latest log: The follower might get elected as the leader, causing a brief split-brain scenario. However, once the old leader receives a heartbeat from the new leader, it will step down since the new leader’s term is larger.
- If the follower does not have the latest log: We let it update its term after receiving enough rejections (it might still have a larger term after one update, but since the rejection messages are sent by the majority, it will eventually update to the correct term).
This ensures a smooth transition of the leader after partitioning or avoids changing the leader altogether. If there are any potential corner cases, please contact me.