How dangerous is the tidb_tikvclient_backoff_seconds_count alert?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb_tikvclient_backoff_seconds_count这个报警多大比较危险

| username: 路在何chu

【TiDB Usage Environment】Production Environment 4013
【Reproduction Path】What operations were performed when the issue occurred
None
【Encountered Issue: Issue Phenomenon and Impact】
Warning: tidb_tikvclient_backoff_seconds_count

cluster: tidb-nova-prod instance: 10.115.27.26: 10080 values: 4467.692307692308 status: firing start_time: 2024-01-11 08:10:05 +08:00 end_time: 0001-01-01 08:05:43 +08:05

| username: 路在何chu | Original post link

It’s already close to 5000. Is this value dangerous? What are your values?

| username: 小龙虾爱大龙虾 | Original post link

It depends. I remember the threshold itself is 10, which is too low.

| username: 路在何chu | Original post link

I set it to 3000.

| username: 路在何chu | Original post link

But there are still too many in the 4000s and 5000s.

| username: 小龙虾爱大龙虾 | Original post link

Then you should analyze it and see what’s going on.

| username: 小龙虾爱大龙虾 | Original post link

This backoff doesn’t specify how large it should be to be considered dangerous. Anyway, TiDB will retry on its own. As long as the SQL doesn’t report errors and isn’t slow, it’s fine.

| username: 路在何chu | Original post link

SQL is not slow. It doesn’t have much impact, it’s just very large. Just doing some research.

| username: wangccsy | Original post link

The unit is milliseconds, which means 5 seconds.

| username: TiDB_C罗 | Original post link

Number of backoff occurrences in 10 minutes

| username: TiDB_C罗 | Original post link

This should be checked to understand why it is so large. I have always had this issue here, so I added a silence in alertmanager. If the sudden increase is greater than the usual average, it needs to be confirmed what the reason is.

| username: 连连看db | Original post link

Frequent writes can trigger hotspot region scheduling, splitting, and leader scheduling, which is normal. You can check the logs to see if there are any error messages.

| username: 路在何chu | Original post link

Actually, I know that all the hotspots are on one table. I have investigated this before, and the only solution is to split that table.

| username: TIDB-Learner | Original post link

Check the TiKV monitoring status, is there any anomaly?

| username: 路在何chu | Original post link

No anomalies.

| username: tidb菜鸟一只 | Original post link

If there are hot tables, it’s normal for this value to be high. As long as the business side doesn’t see any issues, it’s not a problem.

| username: 路在何chu | Original post link

Our hotspot reads are always very high, just those two or three tables, it’s unbearable to look at.

| username: 路在何chu | Original post link

Writing about hot topics is fine.

| username: tidb菜鸟一只 | Original post link

You can try enabling follower reads to see if it alleviates the issue.

| username: 路在何chu | Original post link

The key issue is that my version does not support global modifications yet; it only supports session-level modifications.