Call CheckLeader failed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: call CheckLeader failed

| username: TiDBer_uXv8htBz

The TiKV logs occasionally report “call CheckLeader failed,” and the database performance drops for about 10 minutes. Please help investigate this issue. :pray::pray::pray:

| username: Meditator | Original post link

Is there any batch update task or high concurrency request update at this time? There are many lock conflicts in the logs.

| username: Lucien-卢西恩 | Original post link

From the TiKV logs, it shows that the errors started at 13:00 and continued until 13:09. Did the cluster performance recover after 10 minutes? You can refer to the suggestion above and check the status of the Batch job. Severe lock conflicts are likely caused by high concurrent requests. You should first investigate the business requests.

| username: TiDBer_uXv8htBz | Original post link

The concurrency is basically the same in each time period, but this fixed KV node will encounter this situation.

| username: TiDBer_uXv8htBz | Original post link

The concurrency in each time period is basically the same, but this specific KV node experiences this issue, and it always occurs at the top of the hour. The cluster’s performance recovers after 10 minutes.

| username: Meditator | Original post link

Check the dashboard or Grafana to see if the increased latency in processing individual transactions, leading to lock conflicts, is due to other activities in the cluster (such as analyze operations).

| username: tidb菜鸟一只 | Original post link

Check the SQL during that time period to see if there are any special large SQL tasks.