TiKV CPU Usage Continues to Surge

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv cpu 持续高涨

| username: 表渣渣渣

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.1
[Encountered Problem: Phenomenon and Impact]
Since yesterday afternoon, the CPU usage of a single TiKV has been consistently high, which seems to be highly correlated with the write table tasks:

Checking the machine’s CPU status, TiKV’s CPU usage reaches around 600%, and all query links are inaccessible, causing TiDB to be completely paralyzed;
These tasks have been running for a long time, could it be an anomaly caused by region splitting? Moreover, the problem is only found on this machine, which is very strange, and it has already happened for the third time today;

Temporary Solution: Restart the single TiKV
Long-term Countermeasure:

[Resource Configuration] 4 cores 64G *3
[Attachments: Screenshots/Logs/Monitoring]
Below are the logs recorded when TiKV was abnormal
logs.txt (47.3 KB)

| username: zhanggame1 | Original post link

The configuration is a bit low. One node runs TiKV, PD, and TiDB with only 4 cores, totaling 12 cores for the three combined. The combined performance is not even as good as my laptop, which has 20 logical cores.

| username: 表渣渣渣 | Original post link

The company won’t give a raise, even if you want it, they won’t give it.

| username: Billmay表妹 | Original post link

Tell your boss!

More resources are needed.
This resource can run like this.

I think you can’t use diesel in a car that requires 95 octane!
If this is the configuration, then choosing MySQL would be more friendly to the operations team.

| username: 有猫万事足 | Original post link

I want to know if your PD leader is also on 154?

| username: ffeenn | Original post link

Try to upgrade the configuration.

| username: tidb菜鸟一只 | Original post link

Your cluster topology is a bit strange. You have 3 PDs, 3 TiDBs, 2 TiKVs, and 1 TiFlash? How many replicas do you have for TiKV?
Is the abnormal machine the one with IP ending in 154? Is it a machine with mixed PD, TiDB, and TiFlash nodes?

| username: 表渣渣渣 | Original post link

How do I check this? I looked at the PD logs and didn’t find any information related to the leader.

| username: 表渣渣渣 | Original post link

Yes, this is the topology. This is the smallest topology structure recommended by TiDB, which is placed on three machines: :clown_face:

| username: tidb菜鸟一只 | Original post link

So, isn’t the machine with the exception the TiFlash one? Check the status of the corresponding TiFlash node at the corresponding time point in Grafana.

| username: 表渣渣渣 | Original post link

This TiFlash doesn’t have a single table. I tested it before, but since it didn’t achieve the desired effect, I stopped using it and deleted all the tables inside.

| username: tidb菜鸟一只 | Original post link

Could you please use “tiup cluster display <cluster_name>” to check?

| username: 像风一样的男子 | Original post link

It looks like there is no TiKV deployed on machine 154.

| username: 表渣渣渣 | Original post link

I stopped it because the CPU was too high. After a short break and a restart, it was fine.

| username: 表渣渣渣 | Original post link

You can name this cluster “Zhazhahui”.

| username: 表渣渣渣 | Original post link

The name contains sensitive information, I can’t give it to you.

| username: tidb菜鸟一只 | Original post link

No, what I mean is that you should execute the command
tiup cluster display clustername
and check the display results. I still think you only have 2 TiKV nodes, which seems a bit strange.
You can mask sensitive information, but please show the IP column, node column, and status column.

| username: redgame | Original post link

It looks like a lack of resources.

| username: cy6301567 | Original post link

Insufficient resources, too shabby.

| username: cassblanca | Original post link

4 CPUs are probably not dedicated, they might be sharing physical machine resources in a virtual machine. How can this configuration make TiDB feel good? If it doesn’t feel good, it might occasionally throw a tantrum.