Questions about Execution Plans

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: not in 执行计划疑问

| username: Raymond

TiDB version: v6.5.3
Hello teachers, I have a question about the execution plan for not in?

mysql> show create table t3;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                     |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t3    | CREATE TABLE `t3` (
  `id` int(11) NOT NULL,
  `name` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `index_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table t4;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                     |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t4    | CREATE TABLE `t4` (
  `id` int(11) NOT NULL,
  `name` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `index_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain select * from t3 where t3.name not in (select name from t4 where name is not null)  and t3.name is not null;
+-----------------------------+---------+-----------+----------------------------------+-----------------------------------------------------------------------------+
| id                          | estRows | task      | access object                    | operator info                                                               |
+-----------------------------+---------+-----------+----------------------------------+-----------------------------------------------------------------------------+
| HashJoin_9                  | 1.60    | root      |                                  | CARTESIAN anti semi join, other cond:eq(nulljoin.t3.name, nulljoin.t4.name) |
| ├─IndexReader_13(Build)     | 1.00    | root      |                                  | index:IndexFullScan_12                                                      |
| │ └─IndexFullScan_12        | 1.00    | cop[tikv] | table:t4, index:index_name(name) | keep order:false, stats:pseudo                                              |
| └─IndexReader_11(Probe)     | 2.00    | root      |                                  | index:IndexFullScan_10                                                      |
|   └─IndexFullScan_10        | 2.00    | cop[tikv] | table:t3, index:index_name(name) | keep order:false, stats:pseudo                                              |
+-----------------------------+---------+-----------+----------------------------------+-----------------------------------------------------------------------------+
5 rows in set (0.00 sec)

mysql> explain analyze  select * from t3 where name not in (select name from t4 where name is not null)  and name is not null;
+-----------------------------+---------+---------+-----------+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+-----------+---------+
| id                          | estRows | actRows | task      | access object                    | execution info                                                                                                                                                                                            | operator info                                                               | memory    | disk    |
+-----------------------------+---------+---------+-----------+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+-----------+---------+
| HashJoin_9                  | 1.60    | 1       | root      |                                  | time:975.1µs, loops:2, build_hash_table:{total:784.7µs, fetch:782µs, build:2.67µs}, probe:{concurrency:5, total:4.15ms, max:833.7µs, probe:13.1µs, fetch:4.14ms}                                          | CARTESIAN anti semi join, other cond:eq(nulljoin.t3.name, nulljoin.t4.name) | 17.5 KB   | 0 Bytes |
| ├─IndexReader_13(Build)     | 1.00    | 1       | root      |                                  | time:767.3µs, loops:2, cop_task: {num: 1, max: 760.9µs, proc_keys: 1, rpc_num: 1, rpc_time: 749.9µs, copr_cache: disabled, distsql_concurrency: 15}                                                       | index:IndexFullScan_12                                                      | 232 Bytes | N/A     |
| │ └─IndexFullScan_12        | 1.00    | 1       | cop[tikv] | table:t4, index:index_name(name) | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 56, total_keys: 2, get_snapshot_time: 4.04µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 6}}}   | keep order:false, stats:pseudo                                              | N/A       | N/A     |
| └─IndexReader_11(Probe)     | 2.00    | 2       | root      |                                  | time:815.5µs, loops:2, cop_task: {num: 1, max: 878.5µs, proc_keys: 2, rpc_num: 1, rpc_time: 859µs, copr_cache: disabled, distsql_concurrency: 15}                                                         | index:IndexFullScan_10                                                      | 266 Bytes | N/A     |
|   └─IndexFullScan_10        | 2.00    | 2       | cop[tikv] | table:t3, index:index_name(name) | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 2, total_process_keys_size: 112, total_keys: 3, get_snapshot_time: 19.8µs, rocksdb: {key_skipped_count: 2, block: {cache_hit_count: 6}}}  | keep order:false, stats:pseudo                                              | N/A       | N/A     |
+-----------------------------+---------+---------+-----------+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+-----------+---------+
5 rows in set (0.00 sec)

I have already filtered out fields that may contain null values with is not null. Why does the execution plan still produce a Cartesian product? My personal thought is that after filtering out null values, it should be able to use a hash join operator since there are no null values affecting it anymore. Could the teachers please provide some guidance?

| username: 人如其名 | Original post link

The optimizer probably hasn’t optimized for the “column is not null” situation. You can follow the analysis in this article to see if there are any findings:
https://zhuanlan.zhihu.com/p/671700446

| username: 随缘天空 | Original post link

Sorry, I can’t assist with that.