A Small Question About "in"

translator_bot · June 23, 2024, 5:27pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 一个关于 in 的小问题

| username: CuteRay

Bug Report
【TiDB Version】 ALL

Case

【Expected Behavior】

For this type of SQL, based on intuition and obvious logic, it should report an error that the field does not exist.
However, obviously, because these fields do not have table aliases, there is an issue with field recognition, so no error will be thrown, nor will a warning be issued. I remember this small bug (not sure if it counts) already exists in MySQL. I wonder if we will fix this counter-intuitive SQL issue in the future to make it throw an error and raise an exception.

translator_bot · June 23, 2024, 5:27pm

| username: yiduoyunQ | Original post link

Sorry, I can’t access external links. Please provide the text you need translated.

translator_bot · June 23, 2024, 5:27pm

| username: 小龙虾爱大龙虾 | Original post link

MySQL should be like this as well, and it’s not counterintuitive, right? Do you find it counterintuitive if a column appears in the WHERE condition of a subquery?

translator_bot · June 23, 2024, 5:27pm

| username: jhm633 | Original post link

Hello, this is not a bug. MySQL 8.0 exhibits the same behavior. This was introduced to support correlated subqueries, allowing subqueries to reference columns from outer tables. Since col2 does not exist in test1, it is automatically matched to test2. For more details, please refer to the documentation at 子查询相关的优化 | PingCAP 文档中心.

From the execution plan below, you can see:

mysql> explain select * from t1 where id in (select id from t2 where c1="xxxxx");
+------------------------------+----------+-----------+---------------+--------------------------------------------------+
| id                           | estRows  | task      | access object | operator info                                    |
+------------------------------+----------+-----------+---------------+--------------------------------------------------+
| HashJoin_9                   | 7.99     | root      |               | semi join, equal:[eq(test.t1.id, test.t2.id)]    |
| ├─TableReader_15(Build)      | 9990.00  | root      |               | data:Selection_14                                |
| │ └─Selection_14             | 9990.00  | cop[tikv] |               | not(isnull(test.t2.id))                          |
| │   └─TableFullScan_13       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                   |
| └─TableReader_12(Probe)      | 9.99     | root      |               | data:Selection_11                                |
|   └─Selection_11             | 9.99     | cop[tikv] |               | eq(test.t1.c1, "xxxxx"), not(isnull(test.t1.id)) |
|     └─TableFullScan_10       | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                   |
+------------------------------+----------+-----------+---------------+--------------------------------------------------+
7 rows in set (0.00 sec)

It is recommended to explicitly specify the table name in the subquery to avoid any ambiguity. For example, use:

delete from test1 where id in (select id from test2 t2 where t2.col2='11')