TiKV unified-read-po thread consumes too much CPU resources

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV unified-read-po线程占用过多CPU资源

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.7
[Encountered Problem: Problem Phenomenon and Impact] The unified read pool on multiple TiKV instances is occupying dozens of CPU cores.
[Attachment: Screenshot/Log/Monitoring]

| username: 大飞哥online | Original post link

By default, the size of the Unified read pool is 80% of the machine’s CPU count.

| username: 大飞哥online | Original post link

If a machine has multiple instances, you should adjust it so that the combined instances do not exceed 80%.

| username: 大飞哥online | Original post link

| username: residentevil | Original post link

A TiKV machine with four TiKV instances, the machine has a total of 96 cores, and currently, the system is configured with 76 cores.

| username: residentevil | Original post link

I have reviewed this suggestion, including the optimization of the TIKV thread pool. After adjustments, it currently seems to have no effect.

| username: 大飞哥online | Original post link

A total of 96 cores, 80% is 76 cores. Four instances, each with 76/4=19 cores.

| username: residentevil | Original post link

Currently adjusted to 15 cores, still observing. Normally, this readpool shouldn’t be a bottleneck. The QPS for my stress test is less than 4000 QPS.

| username: 大飞哥online | Original post link

Well, if the load of the read request is large, it will be higher.

| username: residentevil | Original post link

I don’t know if there have been any optimizations for this parameter in versions 6.5 and 7.1.

| username: 大飞哥online | Original post link

Looking at the official documents, it seems there isn’t any.

| username: Soysauce520 | Original post link

Binding NUMA nodes to cores may lead to CPU contention; you also need to check the TopSQL situation.

| username: residentevil | Original post link

How can I check if NUMA binding has been done?

| username: residentevil | Original post link

Got it, lscpu shows that NUMA has already bound the cores.

| username: Fly-bird | Original post link

How are the database load and resource utilization?

| username: 像风一样的男子 | Original post link

Some experts have written columns that you can check out:

| username: residentevil | Original post link

Let me ask you a question. Did you perform a performance stress test before using TiDB? For example, combining with business scenarios: conducting stress tests on TiDB with read requests at concurrency levels of 10, 30, 50, 100, etc., and then observing various monitoring indicators. When I did the stress test yesterday, I found that the thread load of the read pool was particularly high, causing the CPU of the physical machine running TiKV to be almost fully utilized. After adjusting the threads, the CPU usage of TiKV decreased, but the SQL execution time increased.

| username: 像风一样的男子 | Original post link

TiDB is very demanding on disk performance. First, you need to test the disk’s performance. I have a document for testing disk performance Disk Performance Testing Standards.docx (18.3 KB). Next is the cluster testing. The official Sysbench tool is provided for stress testing, and you can see the limits and bottlenecks of your cluster through monitoring.

| username: 像风一样的男子 | Original post link

You can check the slow queries in the dashboard to see where the SQL execution time has increased, continuously identify bottlenecks, and make corresponding optimizations.

| username: residentevil | Original post link

You’re talking about the TIKV module consuming disk space, right?