Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb某一个节点io util利用率很高
[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.2
[Encountered Problem: Phenomenon and Impact] Suddenly, the IO utilization rate of a certain TiDB node is very high
[Resource Configuration] The cluster has a total of 3 nodes, each node is deployed with 1 PD, 1 KV, and 1 TiDB-server
[Attachment: Screenshot/Log/Monitoring]

Generally, TiDB does not have read and write operations. Check if the logs in the directory have increased significantly.
Which disk is higher, node exporter or disk performance? Which component is deployed?
There are about 9 more log files than the other two nodes.
This node has an NVMe disk attached, and it has deployed 1 PD, 1 TiDB server, and 1 KV.
Can you check the screenshot to see when the spike occurred? Also, check if the “thread cpu → unified pool” in TiKV detail has increased.
It should have been a while, I can’t find the exact time point when it suddenly spiked.
Are there export and backup tasks?
Yes. It is indeed connected to this node.
However, it is executed on a scheduled basis.
Check which service is consuming high CPU.
I misread it, I thought it was a TiDB node, but it turns out to be a mixed deployment.
tidb-server is occupying a high amount of resources.
See if it’s caused by the automatic collection of statistics tasks. Search for the auto analyze logs in the tidb-server logs.
For hybrid deployment, you can use the iotop command to see which specific component is using the disk more and the specific IO values.
The tikv-server writes logs to tikv.log. Most of the time, it’s tens of KB/s, and at high times, it’s over 100 KB/s.
This is the normal state, not when there is a problem, right?
Lightning import? Is it necessary to connect directly to a single node?
It’s not a lightning import.
Check the topsql interface to see what is being executed.