TiDB-service CPU Spikes to 99% Without Apparent Reason

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-service cpu无缘无故飙升99%

| username: TiDB_老表

【TiDB Usage Environment】Production Environment
【TiDB Version】6.5.0
【Reproduction Path】None
【Encountered Problem: Phenomenon and Impact】
【Resource Configuration】Navigate to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】



goroutine_pd_172.16.215.211_2379_2724836510.txt (92.9 KB)
goroutine_tidb_172.16.215.211_4000_2957135535.txt (99.1 KB)

| username: xfworld | Original post link

Is the CPU usage really that high even without any queries?

| username: Fly-bird | Original post link

Let’s see if there are any queries.

| username: 大飞哥online | Original post link

Check the spike time point and see what that machine is doing.

| username: 像风一样的男子 | Original post link

This kind of single-node memory and CPU spike is most likely caused by slow queries. You can check topsql on the dashboard to troubleshoot.

| username: 路在何chu | Original post link

The latency at this time seems a bit high. Check if it’s caused by IO, look at the slow logs from that time, and the IO load.

| username: tidb菜鸟一只 | Original post link

It’s highly likely that several large SQL queries were run at the corresponding time. Check TopSQL.

| username: TiDB_老表 | Original post link

Checked and found no slow SQL, no business concurrency.

| username: TiDB_老表 | Original post link

Only the tidb-service is high, no other processes.

| username: TiDB_老表 | Original post link

Checked, there is no problematic SQL.

| username: 大飞哥online | Original post link

There are several modules in TiDB, you can take a look at them separately, and also check the slow logs from that time.

| username: 路在何chu | Original post link

Try restarting to see if it can be reduced. Are the CPU models of each TiDB server the same? Check it out.

| username: TiDB_老表 | Original post link

Restarting works, but the issue will reoccur. The CPU models are the same.

| username: TiDB_老表 | Original post link

Checked everything, couldn’t find any problematic SQL.

| username: 路在何chu | Original post link

Are all three nodes experiencing high CPU usage, or is it just one node?

| username: TiDBer_小阿飞 | Original post link

Slow query or IO_wait.

| username: WalterWj | Original post link

Is it used by the tidb-server process? :thinking:

| username: Miracle | Original post link

Is there anything else deployed on this node?

| username: Inkjade | Original post link

Your deployment environment is a hybrid deployment mode. It is recommended to first identify which specific component has a high CPU usage.

  1. Check traffic issues
  2. Slow SQL
  3. Use the manual analysis function of advanced debugging to analyze the specific reasons for the continuous CPU issue.
| username: 有猫万事足 | Original post link

For this kind of single-node anomaly, it is recommended to directly go to the topsql page to see what this TiDB is running.