Why does Projection_9 in explain analyze have Concurrency:5?

translator_bot · June 22, 2024, 12:35am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: explain analyze 中的Projection_9 为什么会有Concurrency:5 ?

| username: Raymond

Dear teachers, I have a question regarding the experiment on the customer and orders tables under the tpch database. I noticed that the execution plan includes a Projection_9 with Concurrency:5 in the explain analyze output. I don’t quite understand this, as Projection is supposed to be projection, right?

mysql> explain analyze select cname, sum(oprice) from ( select c.C_NAME as 'cname', o.O_CUSTKEY as 'oid', o.O_TOTALPRICE as 'oprice' from customer c join orders o on c.C_CUSTKEY = o.O_CUSTKEY) t1 group by cname;

id	estRows	actRows	task	access object	execution info	operator info	memory	disk
Projection_9	148224.00	99996	root		time:2.31s, loops:99, Concurrency:5	tpch2.customer.c_name, Column#18	743.6 KB	N/A
└─HashAgg_10	148224.00	99996	root		time:2.31s, loops:99	group by:tpch2.customer.c_name, funcs:sum(tpch2.orders.o_totalprice)->Column#18, funcs:firstrow(tpch2.customer.c_name)->tpch2.customer.c_name	22.2 MB	0 Bytes
└─HashJoin_21	1503229.30	1498900	root		time:1.09s, loops:1466, build_hash_table:{total:61ms, fetch:30.2ms, build:30.8ms}, probe:{concurrency:5, total:11.4s, max:2.29s, probe:7.77s, fetch:3.67s}	inner join, equal:[eq(tpch2.customer.c_custkey, tpch2.orders.o_custkey)]	16.2 MB	0 Bytes
├─TableReader_25(Build)	150000.00	150000	root		time:30.3ms, loops:149, cop_task: {num: 11, max: 17ms, min: 615.9µs, avg: 4.4ms, p95: 17ms, max_proc_keys: 3040, p95_proc_keys: 3040, tot_proc: 1ms, tot_wait: 1ms, rpc_num: 11, rpc_time: 48.3ms, copr_cache_hit_ratio: 0.64, distsql_concurrency: 15}	data:TableFullScan_24	2.72 MB	N/A
│ └─TableFullScan_24	150000.00	150000	cop[tikv]	table:c	tikv_task:{proc max:20ms, min:0s, avg: 5.73ms, p80:13ms, p95:20ms, iters:190, tasks:11}, scan_detail: {total_process_keys: 4736, total_process_keys_size: 967579, total_keys: 4740, get_snapshot_time: 1.46ms, rocksdb: {key_skipped_count: 4736, block: {cache_hit_count: 47}}}	keep order:false	N/A	N/A
└─TableReader_23(Probe)	1498900.00	1498900	root		time:137.5ms, loops:1468, cop_task: {num: 55, max: 372.8ms, min: 1.72ms, avg: 52.6ms, p95: 140.1ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 98ms, tot_wait: 3ms, rpc_num: 55, rpc_time: 2.89s, copr_cache_hit_ratio: 0.69, distsql_concurrency: 15}	data:TableFullScan_22	9.19 MB	N/A
└─TableFullScan_22	1498900.00	1498900	cop[tikv]	table:o	tikv_task:{proc max:61ms, min:0s, avg: 19.4ms, p80:36ms, p95:43ms, iters:1682, tasks:55}, scan_detail: {total_process_keys: 181750, total_process_keys_size: 27525710, total_keys: 183409, get_snapshot_time: 5.01ms, rocksdb: {delete_skipped_count: 281, key_skipped_count: 183673, block: {cache_hit_count: 588}}}	keep order:false	N/A	N/A

7 rows in set (2.31 sec)

translator_bot · June 22, 2024, 12:35am

| username: redgame | Original post link

The number of concurrent threads used when performing this operation, making full use of the system’s multi-core processing capabilities to improve query performance.

translator_bot · June 22, 2024, 12:35am

| username: tidb菜鸟一只 | Original post link

The concurrency of the projection operator can be set using the parameter tidb_projection_concurrency in versions before 4. Starting from version 5.0, this parameter has been deprecated and replaced by the tidb_executor_concurrency parameter to uniformly set the concurrency of various SQL operators.

translator_bot · June 22, 2024, 12:35am

| username: Raymond | Original post link

What is the significance of concurrency in projection?

translator_bot · June 22, 2024, 12:35am

| username: cassblanca | Original post link

Brother, did you understand projection as something similar to MongoDB’s projection? TiDB’s projection operator is generally used for expression evaluation. In theory, with sufficient resources, parallel computation should be more efficient than single-process computation, right?

translator_bot · June 22, 2024, 12:35am

| username: Raymond | Original post link

Brother, isn’t there a dedicated operator for expression evaluation, like hashagg? I think it should be what you mentioned, projection shouldn’t be just a simple projection.

translator_bot · June 22, 2024, 12:35am

| username: 人如其名 | Original post link

When there are many return fields and the calculations are complex, projection may become a bottleneck. Even in simple test statements, you can see some performance differences. If you encounter very complex calculations in actual production, the probability of projection becoming a bottleneck will be greater, so concurrency is still very necessary. If the return fields have no calculations, generally speaking, enabling or disabling concurrency has little impact on performance.

translator_bot · June 22, 2024, 12:35am

| username: Raymond | Original post link

I can understand your example, but in my case, the number of returned fields is relatively small, and does it make much sense to use 5 concurrent threads for such a simple Projection operation?

translator_bot · June 22, 2024, 12:35am

| username: 人如其名 | Original post link

When a single thread is fast enough, using five threads won’t consume too many resources. When a single thread isn’t fast enough, five threads will try to use resources to improve speed. Therefore, whether it makes sense or not, it doesn’t harm us as users, nor does it waste CPU resources due to high concurrency without efficiency gains.

translator_bot · June 22, 2024, 12:35am

| username: Raymond | Original post link

Alright, thank you, boss.

translator_bot · June 22, 2024, 12:35am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.