[TiDB][TiFlash] CPU Utilization Not Maximized in High Concurrency Scenarios

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【TIDB】【TIFLASH】高并发场景下CPU利用率不能打满

| username: TiDBer_abThS2LT

[TiDB Usage Environment] Testing
[TiDB Version] v6.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Test SQL and data volume used

TPCH SF 10 data volume.

select l_suppkey from lineitem where l_shipdate >= '1996-12-01' and l_shipdate < date_add('1996-12-01', interval '1' month);

The plan is shown in the figure below

Physical topology
A cluster built with 3 TiFlash nodes.

Issue Phenomenon
Under a scenario with 40 concurrent operations, the CPU utilization fluctuates around 10% and does not increase. Even with increased concurrency, the CPU usage remains similar to before, only using a lot of CPU when the concurrent stress test starts, mainly due to thread creation, and then it drops.

All system parameters use default settings.

| username: WalterWj | Original post link

This version should have a shared thread pool. In theory, it can achieve very high performance. :thinking:

| username: Running | Original post link

Try increasing the concurrency.

| username: TiDBer_abThS2LT | Original post link

At present, even with the shared thread pool enabled, the CPU utilization still cannot be fully maximized under high concurrency scenarios. There must be a bottleneck.

| username: TiDBer_abThS2LT | Original post link

Increasing concurrency has little effect. If it is too high, it will result in more threads, leading to more frequent context switching and lower CPU utilization.

| username: 裤衩儿飞上天 | Original post link

Try adding a compression machine.

| username: TiDBer_abThS2LT | Original post link

Start two stress testing programs on different nodes, targeting the same cluster. Currently, the results are not ideal. Theoretically, the bottleneck appears to be on the target cluster side.

| username: Lucien-卢西恩 | Original post link

After increasing the load, does the CPU usage of TiFlash increase linearly? Please describe the testing process and the observed increase in TiFlash CPU usage.