Load Conditions of TiKV Without Insert and Query Operations in the Database

translator_bot · June 20, 2024, 1:33pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv在数据库无插入查询情况下的负载情况

| username: Qiuchi

【TiDB Usage Environment】Testing
【TiDB Version】6.5.0
【Reproduction Path】What operations were performed to cause the issue
【Encountered Issue: Issue Phenomenon and Impact】
During the interval, the database shows no load and no analyze jobs, but the tikv raft store CPU and grpc poll CPU generate load at a relatively fixed frequency. Is this normal?
【Resource Configuration】

【Attachments: Screenshots/Logs/Monitoring】

It seems to be related to hibernate region, but why are regions still being awakened when the cluster is under no load?

Additionally, in a high-load production environment, this situation is also quite noticeable, and there is a significant difference in raftstore CPU usage among the six KV nodes.

translator_bot · June 20, 2024, 1:33pm

| username: zhaokede | Original post link

Similar to heartbeat messages.

translator_bot · June 20, 2024, 1:33pm

| username: TiDBer_刚 | Original post link

Because the load is too low, even a small scheduled task becomes very noticeable?

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

Do heartbeat messages consume so many resources?

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

In our production environment, under load, this phenomenon is quite noticeable. The default number of raft store threads is 2, which seems to be almost maxed out. Additionally, we currently have 6 KV nodes, and the number of regions is basically evenly distributed, but why does the raft store usage exceed by more than double?

translator_bot · June 20, 2024, 1:33pm

| username: 小于同学 | Original post link

Because the load is too low?

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

We have adjusted the following problem description. We also encounter this situation in production, but the same issue still occurs.

translator_bot · June 20, 2024, 1:33pm

| username: tidb狂热爱好者 | Original post link

It’s clearly abnormal for your CPU to consume 2 cores. 200%.

translator_bot · June 20, 2024, 1:33pm

| username: tidb狂热爱好者 | Original post link

It should be no load.

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

Yes, it’s very strange. I remember this situation only appeared after I upgraded from 4 to 6.5, but I always felt there was no problem. Recently, I needed to do performance tuning, so I started paying attention to this.

translator_bot · June 20, 2024, 1:33pm

| username: 小龙虾爱大龙虾 | Original post link

Take a look at the kv_request panel. Also, how many regions does your current TiKV instance have?

translator_bot · June 20, 2024, 1:33pm

| username: 有猫万事足 | Original post link

Hibernate region is generally used for very large data scales (over 500T) because PD manages region information as a single point, and it has reached the resource bottleneck of PD, so it needs to be enabled.

Looking at your cluster, it doesn’t seem to fall under this issue. Enabling hibernate region might be for optimization, but it seems like the direction is not quite right and has introduced some additional bugs.

If it’s for optimization, it’s better to directly address the problematic points rather than getting entangled with the hibernate region issue.

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

Testing the previous KV had less than 20,000 regions.

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

After disabling the hibernate region, the KV load remains at a high level similar to when the hibernate region is enabled. Essentially, it’s like the situation shown in the image below. It’s not necessarily entangled with the hibernate region; it’s just that this configuration item seems to affect the current state.

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

Could it be caused by GC? However, I found another situation: even when tidb_gc_enable is set to false, TiDB still performs GC operations, but the GC safe point no longer advances. In the case where GC is not enabled, are operations like resolve lock and delete range still necessary? Why is the interval for resolve lock so short?

translator_bot · June 20, 2024, 1:33pm

| username: 洪七表哥 | Original post link

Study and learn

translator_bot · June 20, 2024, 1:33pm

| username: ziptoam | Original post link

This question is quite in-depth. We need to see which tasks will be executed automatically. It feels like heartbeat, GC, and others are all possible.

translator_bot · June 20, 2024, 1:33pm

| username: Qiuchi | Original post link

I just upgraded from 6.5.0 to 6.5.9, and the problem is gone… The new version of the monitoring matches the description of GC.
txn: unexpected high frequency gc · Issue #40759 · pingcap/tidb (github.com)

translator_bot · June 20, 2024, 1:33pm

| username: 呢莫不爱吃鱼 | Original post link

Study and learn