TiKV CPU Usage Continues to Surge

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.1
[Encountered Problem: Phenomenon and Impact]
Since yesterday afternoon, the CPU usage of a single TiKV has been consistently high, which seems to be highly correlated with the write table tasks:

Checking the machine’s CPU status, TiKV’s CPU usage reaches around 600%, and all query links are inaccessible, causing TiDB to be completely paralyzed;
These tasks have been running for a long time, could it be an anomaly caused by region splitting? Moreover, the problem is only found on this machine, which is very strange, and it has already happened for the third time today;

Temporary Solution: Restart the single TiKV
Long-term Countermeasure:

[Resource Configuration] 4 cores 64G *3
[Attachments: Screenshots/Logs/Monitoring]
Below are the logs recorded when TiKV was abnormal
The configuration is a bit low. One node runs TiKV, PD, and TiDB with only 4 cores, totaling 12 cores for the three combined. The combined performance is not even as good as my laptop, which has 20 logical cores.

The company won’t give a raise, even if you want it, they won’t give it.

Tell your boss!

More resources are needed.
This resource can run like this.

I think you can’t use diesel in a car that requires 95 octane!
If this is the configuration, then choosing MySQL would be more friendly to the operations team.

I want to know if your PD leader is also on 154?

Try to upgrade the configuration.

Your cluster topology is a bit strange. You have 3 PDs, 3 TiDBs, 2 TiKVs, and 1 TiFlash? How many replicas do you have for TiKV?
Is the abnormal machine the one with IP ending in 154? Is it a machine with mixed PD, TiDB, and TiFlash nodes?

How do I check this? I looked at the PD logs and didn’t find any information related to the leader.

Yes, this is the topology. This is the smallest topology structure recommended by TiDB, which is placed on three machines: :clown_face:

So, isn’t the machine with the exception the TiFlash one? Check the status of the corresponding TiFlash node at the corresponding time point in Grafana.

This TiFlash doesn’t have a single table. I tested it before, but since it didn’t achieve the desired effect, I stopped using it and deleted all the tables inside.

Could you please use “tiup cluster display <cluster_name>” to check?

It looks like there is no TiKV deployed on machine 154.

I stopped it because the CPU was too high. After a short break and a restart, it was fine.

You can name this cluster “Zhazhahui”.

The name contains sensitive information, I can’t give it to you.

No, what I mean is that you should execute the command
tiup cluster display clustername
and check the display results. I still think you only have 2 TiKV nodes, which seems a bit strange.
You can mask sensitive information, but please show the IP column, node column, and status column.

It looks like a lack of resources.

Insufficient resources, too shabby.

4 CPUs are probably not dedicated, they might be sharing physical machine resources in a virtual machine. How can this configuration make TiDB feel good? If it doesn’t feel good, it might occasionally throw a tantrum.