Load Conditions of TiKV Without Insert and Query Operations in the Database

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv在数据库无插入查询情况下的负载情况

| username: Qiuchi

【TiDB Usage Environment】Testing
【TiDB Version】6.5.0
【Reproduction Path】What operations were performed to cause the issue
【Encountered Issue: Issue Phenomenon and Impact】
During the interval, the database shows no load and no analyze jobs, but the tikv raft store CPU and grpc poll CPU generate load at a relatively fixed frequency. Is this normal?
【Resource Configuration】


【Attachments: Screenshots/Logs/Monitoring】

It seems to be related to hibernate region, but why are regions still being awakened when the cluster is under no load?

Additionally, in a high-load production environment, this situation is also quite noticeable, and there is a significant difference in raftstore CPU usage among the six KV nodes.

| username: zhaokede | Original post link

Similar to heartbeat messages.

| username: TiDBer_刚 | Original post link

Because the load is too low, even a small scheduled task becomes very noticeable?

| username: Qiuchi | Original post link

Do heartbeat messages consume so many resources?

| username: Qiuchi | Original post link

In our production environment, under load, this phenomenon is quite noticeable. The default number of raft store threads is 2, which seems to be almost maxed out. Additionally, we currently have 6 KV nodes, and the number of regions is basically evenly distributed, but why does the raft store usage exceed by more than double?

| username: 小于同学 | Original post link

Because the load is too low?

| username: Qiuchi | Original post link

We have adjusted the following problem description. We also encounter this situation in production, but the same issue still occurs.

| username: tidb狂热爱好者 | Original post link

It’s clearly abnormal for your CPU to consume 2 cores. 200%.

| username: tidb狂热爱好者 | Original post link

It should be no load.

| username: Qiuchi | Original post link

Yes, it’s very strange. I remember this situation only appeared after I upgraded from 4 to 6.5, but I always felt there was no problem. Recently, I needed to do performance tuning, so I started paying attention to this.

| username: 小龙虾爱大龙虾 | Original post link

Take a look at the kv_request panel. Also, how many regions does your current TiKV instance have?

| username: 有猫万事足 | Original post link

Hibernate region is generally used for very large data scales (over 500T) because PD manages region information as a single point, and it has reached the resource bottleneck of PD, so it needs to be enabled.

Looking at your cluster, it doesn’t seem to fall under this issue. Enabling hibernate region might be for optimization, but it seems like the direction is not quite right and has introduced some additional bugs.

If it’s for optimization, it’s better to directly address the problematic points rather than getting entangled with the hibernate region issue.

| username: Qiuchi | Original post link

Testing the previous KV had less than 20,000 regions.

| username: Qiuchi | Original post link

After disabling the hibernate region, the KV load remains at a high level similar to when the hibernate region is enabled. Essentially, it’s like the situation shown in the image below. It’s not necessarily entangled with the hibernate region; it’s just that this configuration item seems to affect the current state.
image

| username: Qiuchi | Original post link

Could it be caused by GC? However, I found another situation: even when tidb_gc_enable is set to false, TiDB still performs GC operations, but the GC safe point no longer advances. In the case where GC is not enabled, are operations like resolve lock and delete range still necessary? Why is the interval for resolve lock so short?

| username: 洪七表哥 | Original post link

Study and learn

| username: ziptoam | Original post link

This question is quite in-depth. We need to see which tasks will be executed automatically. It feels like heartbeat, GC, and others are all possible.

| username: Qiuchi | Original post link

I just upgraded from 6.5.0 to 6.5.9, and the problem is gone… The new version of the monitoring matches the description of GC.
txn: unexpected high frequency gc · Issue #40759 · pingcap/tidb (github.com)




| username: 呢莫不爱吃鱼 | Original post link

Study and learn