TiKV Node CPU Imbalance

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv节点cpu不均衡

| username: 路在何chu

[TiDB Usage Environment] Production Environment
4013
[Reproduction Path] What operations were performed to cause the problem
The CPU of one node is higher than the CPUs of the other three nodes.
[Encountered Problem: Problem Phenomenon and Impact]


Further investigation found that the high usage is due to .unified-readpool-cpu

| username: 路在何chu | Original post link

Moreover, the running tasks on the node with high CPU usage are higher than the other three nodes.

| username: 路在何chu | Original post link

I have already checked the CPU model, core count, and memory of the four nodes. The disk types are all the same.

| username: 路在何chu | Original post link

Moreover, the number of leaders and regions on these four nodes is almost the same.

| username: dba远航 | Original post link

Is it possible that the leader of the data for the relevant business query happens to be on this high-load node?

| username: 路在何chu | Original post link

That’s not possible to always be on this node. The leader will also transfer, right? I suspect it’s caused by that running task, but what tasks are there, and how do I check them?

| username: 小龙虾爱大龙虾 | Original post link

Optimize the slow SQL.

| username: tidb菜鸟一只 | Original post link

The unified-readpool-cpu is consumed more when the amount of data you query is large, resulting in higher CPU usage. Check the distribution of leaders and regions on TiKV. Is it possible that the node with high CPU usage has more leaders?

| username: 路在何chu | Original post link

Overall CPU usage is not high, and there are no slow SQL queries at specific times. All slow SQL queries have been optimized.

| username: 路在何chu | Original post link

Almost the same.

| username: tidb菜鸟一只 | Original post link

Is there a hotspot? Check the corresponding hotspot table to see if the leaders of the regions are all on this high CPU node.

| username: 路在何chu | Original post link

The read hotspot is very high. I’ll check the number of those hotspot tables on each node and take a look.

| username: 路在何chu | Original post link

I checked and found that these two hotspot tables indeed have the most leaders.

| username: 路在何chu | Original post link

This TiKV with ID 144 indeed has a lot of hotspot leaders. How can we move them? Can we intervene manually?

| username: tidb菜鸟一只 | Original post link

You can handle hotspot issues for hotspot tables, TiDB 热点问题处理 | PingCAP 文档中心

| username: 路在何chu | Original post link

These tables have already been sharded. Our primary key IDs are randomly generated strings, not auto-incremented.

| username: tidb菜鸟一只 | Original post link

  • operator add transfer-leader 2 5: Move the Leader of Region 2 to Store 5
    It can be manually moved using pdctl, but that’s too troublesome. Normally, if it’s random, it should be fairly evenly distributed, right…
| username: TiDBer_5Vo9nD1u | Original post link

Are the configurations of each host the same?

| username: 路在何chu | Original post link

It should be that the regions of this hotspot table are mostly on node 114. I checked other tables, and the regions of the corresponding tables on each instance vary in number. This is a probability issue.

| username: 路在何chu | Original post link

The configurations are definitely the same.