Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: semi join 的执行计算解析bug ?

Bug Report
A relatively simple statement:
SELECT order_id FROM fin_dws.dws_fin_east_multi_year_acv_lcy
WHERE data_source = 'EAST'
Directly reports an error, Unknown column ‘order_id’ in ‘field list’
SELECT account_id,
acv_gsr_year AS gs_year,
contract_no,
lcy_total_acv AS acv2
FROM fin_dws.dws_fin2_global_acv_1d t
WHERE order_id NOT IN (
SELECT order_id FROM fin_dws.dws_fin_east_multi_year_acv_lcy
WHERE data_source = 'EAST'
)
AND acv_gsr_year IN (2023, 2024)
AND t.type = 'EAST'
LIMIT 10
Execution plan is as follows:
id,estRows,actRows,task,access object,execution info,operator info,memory,disk
Projection_11,10.00,0,root,,"time:5.31s, loops:1, RU:11.888240, Concurrency:OFF","fin_dws.dws_fin2_global_acv_1d.account_id, fin_dws.dws_fin2_global_acv_1d.acv_gsr_year, fin_dws.dws_fin2_global_acv_1d.contract_no, fin_dws.dws_fin2_global_acv_1d.lcy_total_acv",9.14 KB,N/A
└─Limit_14,10.00,0,root,,"time:5.31s, loops:1","offset:0, count:10",N/A,N/A
└─HashJoin_15,10.00,0,root,,"time:5.31s, loops:1, build_hash_table:{total:11ms, fetch:9.77ms, build:1.23ms}, probe:{concurrency:16, total:1m9.7s, max:5.31s, probe:1m8.8s, fetch:899.3ms}",CARTESIAN anti semi join,438.4 KB,0 Bytes
├─TableReader_21(Build),12426.00,12426,root,,"time:10.1ms, loops:14, cop_task: {num: 7, max: 2.66ms, min: 1.23ms, avg: 1.55ms, p95: 2.66ms, tot_proc: 11µs, tot_wait: 1.73ms, rpc_num: 7, rpc_time: 10.7ms, copr_cache_hit_ratio: 1.00, build_task_duration: 6µs, max_distsql_concurrency: 1}",data:Selection_20,95.9 KB,N/A
│ └─Selection_20,12426.00,12426,cop[tikv],,"tikv_task:{proc max:20ms, min:0s, avg: 7.43ms, p80:12ms, p95:20ms, iters:39, tasks:7}, scan_detail: {get_snapshot_time: 1.38ms, rocksdb: {block: {}}}","eq(fin_dws.dws_fin_east_multi_year_acv_lcy.data_source, ""EAST"")",N/A,N/A
│ └─TableFullScan_19,12426.00,12426,cop[tikv],table:dws_fin_east_multi_year_acv_lcy,"tikv_task:{proc max:20ms, min:0s, avg: 7.43ms, p80:12ms, p95:20ms, iters:39, tasks:7}",keep order:false,N/A,N/A
└─TableReader_18(Probe),12.50,22239,root,,"time:62.8ms, loops:23, cop_task: {num: 18, max: 40.9ms, min: 502.5µs, avg: 8.66ms, p95: 40.9ms, tot_proc: 28.7µs, tot_wait: 4.89ms, rpc_num: 18, rpc_time: 155.3ms, copr_cache_hit_ratio: 1.00, build_task_duration: 17.5µs, max_distsql_concurrency: 3}",data:Selection_17,830.3 KB,N/A
└─Selection_17,12.50,22239,cop[tikv],,"tikv_task:{proc max:120ms, min:0s, avg: 35.6ms, p80:68ms, p95:120ms, iters:141, tasks:18}, scan_detail: {get_snapshot_time: 4.05ms, rocksdb: {block: {}}}","eq(fin_dws.dws_fin2_global_acv_1d.type, ""EAST""), or(eq(cast(fin_dws.dws_fin2_global_acv_1d.acv_gsr_year, double BINARY), 2023), eq(cast(fin_dws.dws_fin2_global_acv_1d.acv_gsr_year, double BINARY), 2024))",N/A,N/A
└─TableFullScan_16,73.55,70424,cop[tikv],table:t,"tikv_task:{proc max:112ms, min:0s, avg: 34.2ms, p80:64ms, p95:112ms, iters:141, tasks:18}",keep order:false,N/A,N/A
So the CARTESIAN anti semi join here has a problem, it is not recognized.
Change to another way of writing:
SELECT account_id,
acv_gsr_year AS gs_year,
contract_no,
lcy_total_acv AS acv2
FROM fin_dws.dws_fin2_global_acv_1d t
LEFT JOIN (
SELECT order_id FROM fin_dws.dws_fin_east_multi_year_acv_lcy
WHERE data_source = 'EAST'
) multi
ON t.order_id = multi.order_id
WHERE multi.order_id IS NULL
AND acv_gsr_year IN (2023, 2024)
AND t.type = 'EAST'
LIMIT 10
This way there is no error.
Table DDL files:
dws_fin_east_multi_year_acv_lcy.ddl.sql (2.7 KB)
dws_fin2_global_acv_1d.ddl.sql (7.9 KB)
【TiDB Version】
7.5.1
【Impact of the Bug】
Affects the accuracy of statistical results.
【Possible Steps to Reproduce the Problem】
Simple SQL can reproduce:
-- NOT IN subquery
WITH t1 AS (
SELECT 1 AS n
),
t2 AS (
SELECT 2 AS b
)
SELECT *
FROM t1
WHERE n NOT IN (SELECT n FROM t2);
And IN subquery
WITH t1 AS (
SELECT 1 AS n
),
t2 AS (
SELECT 2 AS b
)
SELECT *
FROM t1
WHERE n IN (SELECT n FROM t2)
【Observed Unexpected Behavior】
The statement can be executed smoothly without error, producing unexpected results.
【Expected Behavior】
Syntax error
【Related Components and Specific Versions】
tidb
【Other Background Information or Screenshots】
Such as cluster topology, system and kernel version, application app information, etc.; if the problem is related to SQL, please provide SQL statements and related table Schema information; if there are key errors in the node logs, please provide relevant node log content or files; if some business-sensitive information is inconvenient to provide, please leave contact information, and we will communicate with you privately.