Tidb_max_tiflash_threads=1, but CPU usage is still too high

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb_max_tiflash_threads=1,但CPU使用还是太多

| username: 人如其名

[TiDB Usage Environment] Poc
[TiDB Version] v7.0.0

tidb_max_tiflash_threads parameter settings

tidb_max_tiflash_threads introduced from version v6.1.0

  • Scope: SESSION | GLOBAL
  • Persisted to cluster: Yes
  • Type: Integer
  • Default value: -1
  • Range: [-1, 256]
  • Unit: Threads
  • Maximum concurrency for request execution in TiFlash. The default value is -1, indicating that this system variable is invalid. 0 means the value is automatically set by the TiFlash system.

When running TPCH statements, setting this parameter to 1 results in TiFlash using far more than 1 CPU, suggesting that CPU usage is not being controlled effectively.
The testing method is as follows:

[tidb@host0 ~]$ time tiup bench tpch -D tpch30 -H 192.168.31.201 -P 4000 -U root -p '' --sf=30 prepare -T 3  --tiflash-replica 1 --analyze --dropdata
tiup is checking updates for component bench ...
Starting component `bench`: /home/tidb/.tiup/components/bench/v1.12.0/tiup-bench tpch -D tpch30 -H 192.168.31.201 -P 4000 -U root -p  --sf=30 prepare -T 3 --tiflash-replica 1 --analyze --dropdata
DROP TABLE IF EXISTS lineitem
DROP TABLE IF EXISTS partsupp
DROP TABLE IF EXISTS supplier
DROP TABLE IF EXISTS part
DROP TABLE IF EXISTS orders
DROP TABLE IF EXISTS customer
DROP TABLE IF EXISTS region
DROP TABLE IF EXISTS nation
creating nation
creating tiflash replica for nation
creating region
creating tiflash replica for region
creating part
creating tiflash replica for part
creating supplier
creating tiflash replica for supplier
creating partsupp
creating tiflash replica for partsupp
creating customer
creating tiflash replica for customer
creating orders
creating tiflash replica for orders
creating lineitem
creating tiflash replica for lineitem
generating nation table
generate nation table done
generating region table
generate region table done
generating customers table
generate customers table done
generating suppliers table
generate suppliers table done
generating part/partsupplier tables
generate part/partsupplier tables done
generating orders/lineitem tables
generate orders/lineitem tables done
analyzing table lineitem
analyze table lineitem done
analyzing table partsupp
analyze table partsupp done
analyzing table supplier
analyze table supplier done
analyzing table part
analyze table part done
analyzing table orders
analyze table orders done
analyzing table customer
analyze table customer done
analyzing table region
analyze table region done
analyzing table nation
analyze table nation done
Finished

real    137m29.384s
user    20m56.294s
sys     5m57.781s

mysql> select * from information_schema.tiflash_replica;
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| tpch30       | nation     |      102 |             1 |                 |         1 |        1 |
| tpch30       | customer   |      117 |             1 |                 |         1 |        1 |
| tpch30       | supplier   |      111 |             1 |                 |         1 |        1 |
| tpch30       | partsupp   |      114 |             1 |                 |         1 |        1 |
| tpch30       | region     |      105 |             1 |                 |         1 |        1 |
| tpch30       | part       |      108 |             1 |                 |         1 |        1 |
| tpch30       | orders     |      120 |             1 |                 |         1 |        1 |
| tpch30       | lineitem   |      124 |             1 |                 |         1 |        1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
8 rows in set (0.01 sec)

# Deploy top in TiFlash to capture top information of the TiFlash process
nohup top -p `pidof tiflash` -d 1 -n 86400 -b >top.txt &

# Parse top information to facilitate merging into statement execution information, making it easier to view TiFlash process consumption during statement execution
cat top.txt |awk '$0 ~ /^top -/ {print time,tiflash_cpu_ratio,tiflash_mem_ratio;time=$3} $0 ~ /TiFlash/ {tiflash_cpu_ratio=$9;tiflash_mem_ratio=$10}'

# Deploy top information capture in TiFlash (single node) to monitor CPU usage of TiFlash
nohup top -p `pidof tiflash` -d 1 -n 86400 -b >top.txt &
nohup vmstat -tw 1 86400 >vmstat.txt &

Test SQL1-22 of TPCH, testing method:
Test from the perspective of tidb_max_tiflash_threads and parallel (number of concurrent SQLs).
for eachSQL in [Q1,Q2....Q11]
  for eachTiflashThreads in [0,1,2]
    for parallel in [1,2]
	  Incrementally adjust tidb_max_tiflash_threads parameter for each type of SQL, and incrementally adjust SQL concurrency

Set tidb_enforce_mpp=ON at the session level;

The test data obtained is as follows:
Where
SQLN: Refers to the TPCH test statement.
query_start_time: The start time of the statement. If parallel>1, one SQL will be selected and its start time recorded before sending the request to the database.
query_end_time: The end time of the statement. If parallel>1, the end time of the SQL whose start time has been recorded will be recorded after the database returns the result.
tidb_max_tiflash_threads: The concurrent thread parameter configured for a single SQL statement in TiFlash. 0 represents the system default value (current CPU=12, On-line CPU(s) list: 0-11, Thread(s) per core: 1).
parallel: Number of concurrent SQLs. For example, if parallel=2, two SQL requests will be issued simultaneously and wait to finish together.
timeSpend(seconds): query_end_time - query_start_time
cpuSlice: CPU usage rate of the TiFlash process obtained from the TiFlash operating system through top (delta per second), printed every second from query_start_time to query_end_time.

SQL1     query_start_time:2023-05-27-21:18:30  query_end_time:2023-05-27-21:18:40  tidb_max_tiflash_threads:0     parallel:1     timeSpend(seconds):10.716679  cpuSlice:[6,580,859,853,876,846,881,816,898,839]
SQL1     query_start_time:2023-05-27-21:18:40  query_end_time:2023-05-27-21:18:59  tidb_max_tiflash_threads:0     parallel:2     timeSpend(seconds):19.039483  cpuSlice:[831,717,854,907,1006,926,969,997,928,954,950,984,966,908,931,899,891,878,948]
SQL1     query_start_time:2023-05-27-21:18:59  query_end_time:2023-05-27-21:19:27  tidb_max_tiflash_threads:0     parallel:3     timeSpend(seconds):27.265477  cpuSlice:[884,786,858,945,915,946,996,982,951,916,939,960,946,888,946,937,914,928,926,939,953,939,876,1002,973,994,955]
SQL1     query_start_time:2023-05-27-21:19:27  query_end_time:2023-05-27-21:19:44  tidb_max_tiflash_threads:1     parallel:1     timeSpend(seconds):17.038484  cpuSlice:[894,607,237,235,236,231,217,223,230,225,218,217,186,207,225,231,221]
SQL1     query_start_time:2023-05-27-21:19:44  query_end_time:2023-05-27-21:20:05  tidb_max_tiflash_threads:1     parallel:2     timeSpend(seconds):20.997980  cpuSlice:[217,303,485,476,471,446,466,396,475,426,453,457,414,457,389,361,410,449,474,433,449]
SQL1     query_start_time:2023-05-27-21:20:05  query_end_time:2023-05-27-21:20:39  tidb_max_tiflash_threads:1     parallel:3     timeSpend(seconds):33.978577  cpuSlice:[436,454,565,540,546,568,587,555,546,574,579,560,591,600,548,588,553,539,568,573,577,587,566,537,515,584,560,565,587,548,576,579,575,610]
SQL1     query_start_time:2023-05-27-21:20:39  query_end_time:2023-05-27-21:20:49  tidb_max_tiflash_threads:2     parallel:1     timeSpend(seconds):10.312490  cpuSlice:[548,501,474,453,467,450,463,430,423,450]
SQL1     query_start_time:2023-05-27-21:20:49  query_end_time:2023-05-27-21:21:14  tidb_max_tiflash_threads:2     parallel:2     timeSpend(seconds):24.691146  cpuSlice:[450,435,531,524,546,548,568,553,528,556,550,550,533,509,528,541,561,585,491,531,526,555,582,526,556]
SQL1     query_start_time:2023-05-27-21:21:14  query_end_time:2023-05-27-21:21:49  tidb_max_tiflash_threads:2     parallel:3     timeSpend(seconds):34.911054  cpuSlice:[578,672,544,546,539,559,595,543,583,577,557,587,564,605,571,563,571,563,555,587,550,568,596,591,707,579,532,513,544,567,574,557,563,577,564]
SQL2     query_start_time:2023-05-27-21:21:49  query_end_time:2023-05-27-21:21:52  tidb_max_tiflash_threads:0     parallel:1     timeSpend(seconds):3.210497   cpuSlice:[535,846,1038]
SQL2     query_start_time:2023-05-27-21:21:52  query_end_time:2023-05-27-21:21:58  tidb_max_tiflash_threads:0     parallel:2     timeSpend(seconds):6.319045   cpuSlice:[886,730,971,1095,1055]
SQL2     query_start_time:2023-05-27-21:21:58  query_end_time:2023-05-27-21:22:07  tidb_max_tiflash_threads:0     parallel:3     timeSpend(seconds):9.322105   cpuSlice:[1018,901,529,816,925,1080,1143,1050,1032]
SQL2     query_start_time:2023-05-27-21:22:08  query_end_time:2023-05-27-21:22:12  tidb_max_tiflash_threads:1     parallel:1     timeSpend(seconds):4.223496   cpuSlice:[972,681,589,606]
SQL2     query_start_time:2023-05-27-21:22:12  query_end_time:2023-05-27-21:22:18  tidb_max_tiflash_threads:1     parallel:2     timeSpend(seconds):6.694010   cpuSlice:[414,274,831,809,893,926]
SQL2     query_start_time:2023-05-27-21:22:19  query_end_time:2023-05-27-21:22:28  tidb_max_tiflash_threads:1     parallel:3     timeSpend(seconds):9.434295   cpuSlice:[563,444,886,886,915,950,924,952,876]
SQL2     query_start_time:2023-05-27-21:22:28  query_end_time:2023-05-27-21:22:32  tidb_max_tiflash_threads:2     parallel:1     timeSpend(seconds):3.682497   cpuSlice:[763,490,594,783]
SQL2     query_start_time:2023-05-27-21:22:32  query_end_time:2023-05-27-21:22:39  tidb_max_tiflash_threads:2     parallel:2     timeSpend(seconds):6.624494   cpuSlice:[756,430,871,905,930,726,742]
SQL2     query_start_time:2023-05-27-21:22:39  query_end_time:2023-05-27-21:22:49  tidb_max_tiflash_threads:2     parallel:3     timeSpend(seconds):9.999767   cpuSlice:[567,646,972,883,968,838,765,716,741,745]
SQL3     query_start_time:2023-05-27-21:22:49  query_end_time:2023-05-27-21:23:00  tidb_max_tiflash_threads:0     parallel:1     timeSpend(seconds):11.033197  cpuSlice:[599,665,850,794,783,830,787,754,768,727,760]
SQL3     query_start_time:2023-05-27-21:23:00  query_end_time:2023-05-27-21:23:20  tidb_max_tiflash_threads:0     parallel:2     timeSpend(seconds):19.828656  cpuSlice:[346,618,945,921,992,882,915,855,895,850,790,890,825,830,838,823,821,852,916,825]
SQL3     query_start_time:2023-05-27-21:23:20  query_end_time:2023-05-27-21:23:48  tidb_max_tiflash_threads:0     parallel:3     timeSpend(seconds):27.750755  cpuSlice:[211,748,895,920,952,988,984,880,834,928,898,942,982,924,989,902,886,977,949,902,912,919,888,944,879,911,942]
SQL3     query_start_time:2023-05-27-21:23:48  query_end_time:2023-05-27-21:24:01  tidb_max_tiflash_threads:1     parallel:1     timeSpend(seconds):13.542488  cpuSlice:[638,134,213,195,203,206,315,460,420,439,444,375,440]
SQL3     query_start_time:2023-05-27-21:24:01  query_end_time:2023-05-27-21:24:24  tidb_max_tiflash_threads:1     parallel:2     timeSpend(seconds):23.071479  cpuSlice:[406,63,426,384,363,399,349,365,454,479,477,486,511,482,482,502,494,529,524,505,497,511,522]
SQL3     query_start_time:2023-05-27-21:24:24  query_end_time:2023-05-27-21:24:57  tidb_max_tiflash_threads:1     parallel:3     timeSpend(seconds):32.232471  cpuSlice:[492,184,495,567,600,530,592,552,532,542,693,503,531,515,527,536,532,483,506,529,519,499,520,477,550,630,520,479,503,511,533,513,526]
SQL3     query_start_time:2023-05-27-21:24:57  query_end_time:2023-05-27-21:25:09  tidb_max_tiflash_threads:2     parallel:1     timeSpend(seconds):12.274989  cpuSlice:[398,189,403,381,403,465,464,470,450,465,474,451]
SQL3     query_start_time:2023-05-27-21:25:09  query_end_time:2023-05-27-21:25:32  tidb_max_tiflash_threads:2     parallel:2     timeSpend(seconds):23.244982  cpuSlice:[467,127,712,742,666,674,670,675,542,470,469,510,489,442,484,504,500,513,567,446,474,483,483]
SQL3     query_start_time:2023-05-27-21:25:32  query_end_time:2023-
| username: tidb菜鸟一只 | Original post link

This refers to the concurrency of a single request, right?

| username: lilinghai | Original post link

When TiFlash runs a query and uses MPP, the query will be split into multiple MPP tasks, and the maximum number of threads used by each MPP task is controlled by max_threads.

| username: 人如其名 | Original post link

However, I only have one TiFlash node in this cluster. Shouldn’t there be only one mppTask on a single TiFlash node?

| username: lilinghai | Original post link

The number of MPP tasks for a query depends on the complexity of the query and the number of TiFlash nodes. An MPP query is divided into several stages (fragments) based on the execution plan, and each fragment is divided into multiple MPP tasks (one for each TiFlash node).

| username: 人如其名 | Original post link

But my TiDB cluster only has one TiFlash node.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.