[Help] Left Join Query Taking Too Long

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【求助】left join 查询耗时过久

| username: just_coder

【TiDB Usage Environment】Business Testing
【TiDB Version】v5.4.1
【Encountered Problem】Left join query takes too long
【Reproduction Path】First execution takes 18s, slow log shows outdated statistics, still takes 18s after analyze
【Problem Phenomenon and Impact】
The driving table has 12 million rows, and the driven table has 7 million rows.
TiKV query takes about 18s, TiFlash takes 30s.
Checking the execution plan reveals that the time is spent on the probe of the HashJoin operator.
Checking system resources, TiKV CPU usage is not high, TiDB server CPU usage spikes to over 2000%.
Is this query speed normal?

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: just_coder | Original post link

The TiKV execution plan is as follows:

Projection_8 2304.50 6377 root time:28.3s, loops:9, Concurrency:8 yd_result.yd_material_summary.appid, yd_result.yd_material_summary.mid, yd_result.yd_material_summary.eid, yd_result.yd_material_summary.mtype, yd_result.yd_material_summary.valid_recdate, Column#168, Column#169, Column#170, Column#171, Column#172, Column#173, Column#174, Column#175, Column#176, Column#177 2.24 MB N/A
└─HashAgg_9 2304.50 6377 root time:28.3s, loops:9, partial_worker:{wall_time:28.26943855s, concurrency:8, task_num:96317, tot_wait:1m17.229505248s, tot_exec:2m28.810792245s, tot_time:3m46.142556324s, max:28.269358296s, p95:28.269358296s}, final_worker:{wall_time:28.291671228s, concurrency:8, task_num:64, tot_wait:3m46.140120196s, tot_exec:169.672801ms, tot_time:3m46.309897564s, max:28.291580971s, p95:28.291580971s} group by:Column#230, Column#231, Column#232, funcs:sum(Column#215)->Column#168, funcs:sum(Column#216)->Column#169, funcs:sum(Column#217)->Column#170, funcs:sum(Column#218)->Column#171, funcs:sum(Column#219)->Column#172, funcs:sum(Column#220)->Column#173, funcs:sum(Column#221)->Column#174, funcs:sum(Column#222)->Column#175, funcs:sum(Column#223)->Column#176, funcs:sum(Column#224)->Column#177, funcs:firstrow(Column#225)->yd_result.yd_material_summary.mid, funcs:firstrow(Column#226)->yd_result.yd_material_summary.eid, funcs:firstrow(Column#227)->yd_result.yd_material_summary.mtype, funcs:firstrow(Column#228)->yd_result.yd_material_summary.valid_recdate, funcs:firstrow(Column#229)->yd_result.yd_material_summary.appid 28.6 MB N/A
└─Projection_58 9788468.14 98624017 root time:27.9s, loops:96318, Concurrency:8 yd_result.yd_material_summary.payment, cast(yd_result.yd_material_summary.viewcount, decimal(10,0) BINARY)->Column#216, cast(yd_result.yd_material_summary.clickcount, decimal(10,0) BINARY)->Column#217, cast(yd_result.yd_material_summary.newlogincount, decimal(10,0) BINARY)->Column#218, cast(yd_result.yd_material_summary.regdevcount, decimal(10,0) BINARY)->Column#219, cast(yd_result.yd_material_summary.reglogincount, decimal(10,0) BINARY)->Column#220, yd_result.yd_material_summary.newuserincome_after, cast(yd_result.yd_material_summary.newuserpaypeople, decimal(10,0) BINARY)->Column#222, yd_result.yd_material_summary_ltv.ltv3income_after, yd_result.yd_material_summary_ltv.ltv7income_after, yd_result.yd_material_summary.mid, yd_result.yd_material_summary.eid, yd_result.yd_material_summary.mtype, yd_result.yd_material_summary.valid_recdate, yd_result.yd_material_summary.appid, yd_result.yd_material_summary.mid, yd_result.yd_material_summary.eid, yd_result.yd_material_summary.mtype 5.81 MB N/A
└─HashJoin_16 9788468.14 98624017 root time:27.6s, loops:96318, build_hash_table:{total:325.6ms, fetch:167.5ms, build:158.1ms}, probe:{concurrency:8, total:3m41.4s, max:28.3s, probe:3m38.8s, fetch:2.63s} inner join, equal:[eq(yd_result.yd_material_summary.mid, yd_result.yd_material_summary_ltv.mid) eq(yd_result.yd_material_summary.eid, yd_result.yd_material_summary_ltv.eid)] 62.5 MB 0 Bytes
├─IndexLookUp_35(Build) 446206.71 348412 root time:264.8ms, loops:342, index_task: {total_time: 76.6ms, fetch_handle: 74.2ms, build: 142.4µs, wait: 2.26ms}, table_task: {total_time: 618.8ms, num: 27, concurrency: 8} 62.1 MB N/A
│ ├─Selection_33(Build) 557179.65 471667 cop[tikv] time:8.65ms, loops:520, cop_task: {num: 2, max: 3.94ms, min: 3.74ms, avg: 3.84ms, p95: 3.94ms, rpc_num: 2, rpc_time: 7.59ms, copr_cache_hit_ratio: 1.00}, tikv_task:{proc max:1.37s, min:58ms, p80:1.37s, p95:1.37s, iters:733, tasks:2} or(or(eq(yd_result.yd_material_summary.appid, 301), eq(yd_result.yd_material_summary.appid, 180)), or(eq(yd_result.yd_material_summary.appid, 190), or(eq(yd_result.yd_material_summary.appid, 283), and(eq(yd_result.yd_material_summary.appid, 293), in(yd_result.yd_material_summary.gameid, 1639021042620310, 1639446704207520, 1639019434662730))))) N/A N/A
│ │ └─IndexRangeScan_31 703862.74 741026 cop[tikv] table:a, index:PRIMARY(recdate, gameid, osid, cid, aid, oid, mid, eid, site_set, pt, appid) tikv_task:{proc max:1.19s, min:51ms, p80:1.19s, p95:1.19s, iters:733, tasks:2} range:[2022-06-21,2022-07-21], keep order:false N/A N/A
│ └─Selection_34(Probe) 446206.71 348412 cop[tikv] time:152.7ms, loops:431, cop_task: {num: 111, max: 6.27ms, min: 955µs, avg: 1.91ms, p95: 3.76ms, max_proc_keys: 616, p95_proc_keys: 318, tot_proc: 19ms, tot_wait: 16ms, rpc_num: 111, rpc_time: 207.9ms, copr_cache_hit_ratio: 0.90}, tikv_task:{proc max:131ms, min:1ms, p80:37ms, p95:73ms, iters:945, tasks:111}, scan_detail: {total_process_keys: 2853, total_process_keys_size: 951723, total_keys: 2862, rocksdb: {delete_skipped_count: 0, key_skipped_count: 5546, block: {cache_hit_count: 110, read_count: 0, read_byte: 0 Bytes}}} eq(yd_result.yd_material_summary.mtype, 2) N/A N/A
│ └─TableRowIDScan_32 557179.65 471667 cop[tikv] table:a tikv_task:{proc max:130ms, min:1ms, p80:36ms, p95:71ms, iters:945, tasks:111} keep order:false N/A N/A
└─TableReader_41(Probe) 449258.69 449187 root time:59.5ms, loops:444, cop_task: {num: 8, max: 149.9ms, min: 50.5ms, avg: 118.5ms, p95: 149.9ms, rpc_num: 8, rpc_time: 947.8ms, copr_cache_hit_ratio: 0.00} data:Selection_40 44.6 MB N/A
└─Selection_40 449258.69 449187 cop[tiflash] tiflash_task:{proc max:33.5ms, min:21.8ms, p80:33.3ms, p95:33.5ms, iters:30, tasks:8, threads:8} eq(2, yd_result.yd_material_summary_ltv.mtype) N/A N/A
└─TableFullScan_39 714973.00 656836 cop[tiflash] table:b tiflash_task:{proc max:29.3ms, min:21.8ms, p80:29ms, p95:29.3ms, iters:30, tasks:8, threads:8} keep order:false N/A N/A
| username: ddhe9527 | Original post link

The b table retrieved 449,187 rows from TiFlash, the a table retrieved 348,412 rows from TiKV, and the JOIN resulted in 98,624,017 rows. Therefore, you need to check whether there is a many-to-many relationship in the JOIN fields causing a Cartesian product, even though your final HashAgg aggregates the duplicate data.

| username: just_coder | Original post link

Thank you for the explanation. I didn’t look carefully before and thought it was 900W. This should be an issue with the developed SQL. Thanks!