The handling of EpochNotMatch by CallCommandOnLeader can easily lead to request timeouts

translator_bot · June 23, 2024, 9:24am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: CallCommandOnLeader 对于 EpochNotMatch的处理方式容易导致请求超时

| username: howz97

After receiving the EpochNotMatch error in CallCommandOnLeader (usually caused by a split), the test code currently updates the leader and retries. However, since the region remains unchanged, it continues to fail until it times out (5s). I think the region should be updated before retrying.

if resp.Header.Error != nil {
            err := resp.Header.Error
            if err.GetStaleCommand() != nil || err.GetEpochNotMatch() != nil || err.GetNotLeader() != nil {
                log.Debugf("encountered retryable error %+v", resp)
                // fixme: maybe region split when requesting, resp will be EpochNotMatch until timeout
                if err.GetNotLeader() != nil && err.GetNotLeader().Leader != nil {
                    leader = err.GetNotLeader().Leader
                    log.Debugf("retry on leader peer=%d,%d", leader.Id, leader.StoreId)
                } else {
                    leader = c.LeaderOfRegion(regionID)
                }
                continue
            }
        }

translator_bot · June 23, 2024, 9:24am

| username: TiDBer_6BMuBGMF | Original post link

I think so too…