[TiDB][TiFlash] CPU Utilization Not Maximized in High Concurrency Scenarios

| username: TiDBer_abThS2LT

[TiDB Usage Environment] Testing
[TiDB Version] v6.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Test SQL and data volume used

TPCH SF 10 data volume.

select l_suppkey from lineitem where l_shipdate >= '1996-12-01' and l_shipdate < date_add('1996-12-01', interval '1' month);

The plan is shown in the figure below

Physical topology
A cluster built with 3 TiFlash nodes.

Issue Phenomenon
Under a scenario with 40 concurrent operations, the CPU utilization fluctuates around 10% and does not increase. Even with increased concurrency, the CPU usage remains similar to before, only using a lot of CPU when the concurrent stress test starts, mainly due to thread creation, and then it drops.

All system parameters use default settings.

| username: WalterWj

This version should have a shared thread pool. In theory, it can achieve very high performance. :thinking:

| username: Running

Try increasing the concurrency.

| username: TiDBer_abThS2LT

At present, even with the shared thread pool enabled, the CPU utilization still cannot be fully maximized under high concurrency scenarios. There must be a bottleneck.

| username: TiDBer_abThS2LT

Increasing concurrency has little effect. If it is too high, it will result in more threads, leading to more frequent context switching and lower CPU utilization.

| username: 裤衩儿飞上天

Try adding a compression machine.

| username: TiDBer_abThS2LT

Start two stress testing programs on different nodes, targeting the same cluster. Currently, the results are not ideal. Theoretically, the bottleneck appears to be on the target cluster side.

| username: Lucien-卢西恩

After increasing the load, does the CPU usage of TiFlash increase linearly? Please describe the testing process and the observed increase in TiFlash CPU usage.