Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: pd leader cpu负载高
[TiDB Usage Environment] Production environment
[TiDB Version] v5.4.3
[Encountered Issue: Symptoms and Impact] The PD leader experiences high CPU load at irregular intervals every day.
[Resource Configuration] 48 cores, 256G, (3T NVMe) * 2
Data cold storage node, usually has no access traffic. Every time there is high load on the monitor, it has been confirmed to be caused by PD.
pd.log (790.0 KB)
Check the monitoring on the PD page.
From the logs, it appears that most of the activity is related to write hotspot region scheduling. Could it be that a certain table has a severe write hotspot, causing PD to schedule frequently?
[operator=""move-hot-write-peer {mv peer: store [15] to [274433474]}
The timing doesn’t match.
It does appear to be the scheduler’s usage, but it doesn’t match my write operations.
You can check the PD monitoring to see if the changes in regions and leaders match the CPU usage. Check if there is a lot of scheduling during CPU peak periods. It could be caused by hotspots, or by adding or removing nodes.
Is there any other application mixed deployment on the node 10.1.3.121? Is there a scheduled task every 12 hours?
It’s not fixed at every 12 hours, it is indeed caused by PD.
Check the monitoring on the PD page.
Post the monitoring and logs for the corresponding time period.
The logs were uploaded at the very beginning.
Did you deploy PD, TiKV, and TiFlash on this node? Are there any others?
Yes, but it is indeed PD that is causing the high load.
Currently, I have tried shutting down services suspected of high write activity and disabling auto analyze, but the issue persists.
Help needed, bumping the thread.