High CPU Usage in TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV CPU高

| username: 等一分钟

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1.5
Only the CPU usage of TIKV nodes is high, while tidb_server is not high. Can we see what TIKV is doing?

| username: Fly-bird | Original post link

Check if there are slow queries on the dashboard.

| username: 等一分钟 | Original post link

Sorry, I can’t assist with that.

| username: xfworld | Original post link

How many core CPUs are allocated to each TiKV node? It hasn’t even reached 100%, only using 1 core.

| username: 等一分钟 | Original post link

16 cores

| username: 等一分钟 | Original post link

This is going to reach 100%, right? It’s the total CPU, not a single core.

| username: Billmay表妹 | Original post link

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

Let’s take a look at your configuration.

| username: 等一分钟 | Original post link

The image is not visible. Please provide the text you need translated.

| username: xfworld | Original post link

Linux calculates CPU usage based on 1 core = 100%, so if there are 20 cores, it would be 2000%…

Similar to:

| username: 等一分钟 | Original post link

The metric in the monitoring is %Cpu(s), in this line

| username: TiDBer_小阿飞 | Original post link

Check the logs of TiDB nodes, TiKV nodes, or PD nodes, and there will always be one or more telling you what they are doing :smile:

| username: 等一分钟 | Original post link

[2023/10/07 14:56:07.818 +08:00] [INFO] [tracker.rs:254] [slow-query] [perf_stats.internal_delete_skipped_count=0] [perf_stats.internal_key_skipped_count=2336] [perf_stats.block_read_byte=11764200] [perf_stats.block_read_count=530] [perf_stats.block_cache_hit_count=74795] [scan.range.first=“Some(start: 74800000000006B8E25F728000000010B67A32 end: 74800000000006B8E25F728000000010B67A33)”] [scan.ranges=4468] [scan.total=6048] [scan.processed_size=1947020] [scan.processed=5400] [scan.is_desc=false] [tag=select] [table_id=440546] [txn_start_ts=444769653101428740] [total_suspend_time=8.387677712s] [total_process_time=1.040620621s] [handler_build_time=810.111µs] [wait_time.snapshot=18.695µs] [wait_time.schedule=20.41388ms] [wait_time=20.432575ms] [total_lifetime=9.449548814s] [remote_host=ipv4:10.0.254.62:36772] [region_id=92143359]

| username: 等一分钟 | Original post link

Can you tell anything from this?

| username: TiDBer_小阿飞 | Original post link

slow-query 慢查询
[tag=select]
[total_suspend_time=8.387677712s] Suspended?
What operation was performed?

| username: 等一分钟 | Original post link

I don’t know, a section in the log.

| username: TiDBer_小阿飞 | Original post link

txn_start_ts seems to be a synchronization pause?
Did you use TICDC to create a synchronization task?

| username: 等一分钟 | Original post link

No, only DM synchronization.

| username: TiDBer_小阿飞 | Original post link

Then let’s check it together with the dashboard and Prometheus.

| username: 路在何chu | Original post link

Use top -H to see which thread has a high CPU usage, then check Grafana for thread CPU.

| username: 路在何chu | Original post link

It is also possible that your TiKV’s CPU is indeed insufficient because the SQL load is all on TiKV.