Is it normal for region heartbeats to fluctuate irregularly?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: region 心跳这样定时忽高忽低的正常吗

| username: xingzhenxiang

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]

| username: wangccsy | Original post link

There should be no problem.

| username: 小龙虾爱大龙虾 | Original post link

Fake: Heartbeat, it’s just beating :joy_cat:
Real: Higher versions have silent regions enabled by default, but silent regions can be awakened by GC. The GC interval is set to 10 minutes by default, which can be verified by comparing the GC-related monitoring panels.

Reference:

| username: xingzhenxiang | Original post link

I’ll look into it, thank you.

| username: Jellybean | Original post link

The amount of heartbeat sent is closely related to the scale of the cluster’s data. Could you please provide the approximate values for the total data volume of the cluster, the number of regions, and the number of region leaders?

| username: dba远航 | Original post link

A heartbeat is just like this, otherwise why would it be called a heartbeat?

| username: Kongdom | Original post link

What you said makes a lot of sense, I agree with this view :yum:

| username: xingzhenxiang | Original post link

This is information related to the amount of region data. Please help check if it is normal.

| username: 像风一样的男子 | Original post link

This heartbeat every 10 minutes is very regular and quite good. It will only be a problem if it becomes a straight line someday.

| username: xingzhenxiang | Original post link

Fluctuating, is this normal?

| username: xingzhenxiang | Original post link

The key issue is jumping from 2.7 to 15k+, is this normal?

| username: 像风一样的男子 | Original post link

I took a look at mine, it’s even worse than yours.

| username: 像风一样的男子 | Original post link

This maximum value matches the number of region leaders on each KV.

| username: forever | Original post link

If there are no issues with the cluster, I think the problem is not significant.

| username: TiDBer_小阿飞 | Original post link

The 10-minute interval is the default time for GC, it’s very regular, no problem at all.

| username: xingzhenxiang | Original post link

My GC setting is 1 hour.

| username: xingzhenxiang | Original post link

Take a look at my other cluster, it’s even more excessive than this one. The main issue is that the new and old clusters are different, which makes me a bit anxious. This is the old cluster.

| username: TiDBer_小阿飞 | Original post link

tidb_gc_run_interval: Specifies the time interval for running garbage collection (GC)
1705545402384

| username: 春风十里 | Original post link

Is it possible that the change in the number of regions caused by GC is leading to abnormal heartbeat displays?

| username: TiDBer_小阿飞 | Original post link

When there are a large number of Regions, Raftstore needs to spend some time processing the heartbeats of these Regions, which brings some delays and causes certain read and write requests to not be processed in a timely manner. If the read and write pressure is high, the CPU usage of the Raftstore thread can easily reach its limit, leading to further delays and affecting performance.

In practice, read and write requests are not evenly distributed across each Region but are concentrated on a few Regions. Therefore, it is possible to reduce the number of messages for temporarily idle Regions, which is the function of Hibernate Region. When unnecessary, raft-base-tick can be avoided, meaning the Raft state machine of idle Regions is not driven, thus preventing these Regions from generating heartbeat information and significantly reducing the workload of Raftstore.

Hibernate Region is enabled by default on the TiKV master branch. You can configure the enabling and disabling of this feature according to actual situations and needs. Please refer to Configure Hibernate Region.

hibernate-regions

  • Enable or disable silent Regions. When enabled, if a Region remains inactive for a long time, it is automatically set to a silent state. Silent Regions can reduce the system overhead of heartbeat information between Leader and Follower. The heartbeat interval between Leader and Follower can be adjusted through peer-stale-state-check-interval.
  • Default value: The default value is true for versions v5.0.2 and later, and false for versions earlier than v5.0.2.