Logical Optimization Fails to Eliminate Redundant Sorting

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 逻辑优化未能消除无用排序

| username: 人如其名

For the orders table in the tpch schema, there is the following query:

select max(o_clerk), min(o_clerk) from (select o_clerk from orders order by o_clerk desc) a;

Viewing the execution plan:

mysql> explain select max(o_clerk), min(o_clerk) from (select o_clerk from orders order by o_clerk desc) a;
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| id                         | estRows     | task      | access object | operator info                                                                          |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| StreamAgg_8                | 1.00        | root      |               | funcs:max(tpch1.orders.o_clerk)->Column#10, funcs:min(tpch1.orders.o_clerk)->Column#11 |
| └─Sort_13                  | 15000000.00 | root      |               | tpch1.orders.o_clerk:desc                                                              |
|   └─TableReader_12         | 15000000.00 | root      |               | data:TableFullScan_11                                                                  |
|     └─TableFullScan_11     | 15000000.00 | cop[tikv] | table:orders  | keep order:false                                                                       |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

It can be seen that a Sort operation is still performed.
Expectation: The optimizer rewrites the statement to: select max(o_clerk), min(o_clerk) from orders;
Expected execution plan:

mysql> explain select max(o_clerk), min(o_clerk) from orders;
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| id                         | estRows     | task      | access object | operator info                                                                          |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| StreamAgg_16               | 1.00        | root      |               | funcs:max(Column#14)->Column#10, funcs:min(Column#15)->Column#11                       |
| └─TableReader_17           | 1.00        | root      |               | data:StreamAgg_8                                                                       |
|   └─StreamAgg_8            | 1.00        | cop[tikv] |               | funcs:max(tpch1.orders.o_clerk)->Column#14, funcs:min(tpch1.orders.o_clerk)->Column#15 |
|     └─TableFullScan_15     | 15000000.00 | cop[tikv] | table:orders  | keep order:false                                                                       |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)

Can this type of issue be optimized and enhanced?

| username: 我是咖啡哥 | Original post link

Still not that intelligent :grinning:

| username: tidb菜鸟一只 | Original post link

This is a classic subquery. It will definitely prioritize executing the part “select o_clerk from orders order by o_clerk desc” first…

| username: 人如其名 | Original post link

The elimination of invalid associations also needs to be strengthened, as follows:

mysql> show create table customer \G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE `customer` (
  `C_CUSTKEY` bigint(20) NOT NULL,
  `C_NAME` varchar(25) NOT NULL,
  `C_ADDRESS` varchar(40) NOT NULL,
  `C_NATIONKEY` bigint(20) NOT NULL,
  `C_PHONE` char(15) NOT NULL,
  `C_ACCTBAL` decimal(15,2) NOT NULL,
  `C_MKTSEGMENT` char(10) NOT NULL,
  `C_COMMENT` varchar(117) NOT NULL,
  PRIMARY KEY (`C_CUSTKEY`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> show create table nation \G
*************************** 1. row ***************************
       Table: nation
Create Table: CREATE TABLE `nation` (
  `N_NATIONKEY` bigint(20) NOT NULL,
  `N_NAME` char(25) NOT NULL,
  `N_REGIONKEY` bigint(20) NOT NULL,
  `N_COMMENT` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`N_NATIONKEY`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> explain select count(*) from customer a left join nation b on a.C_NATIONKEY=b.N_NATIONKEY;
+-------------------------------+------------+-----------+---------------+---------------------------------------------------------------------------------+
| id                            | estRows    | task      | access object | operator info                                                                   |
+-------------------------------+------------+-----------+---------------+---------------------------------------------------------------------------------+
| HashAgg_8                     | 1.00       | root      |               | funcs:count(1)->Column#13                                                       |
| └─HashJoin_19                 | 1500000.00 | root      |               | left outer join, equal:[eq(tpch.customer.c_nationkey, tpch.nation.n_nationkey)] |
|   ├─TableReader_24(Build)     | 25.00      | root      |               | data:TableFullScan_23                                                           |
|   │ └─TableFullScan_23        | 25.00      | cop[tikv] | table:b       | keep order:false, stats:pseudo                                                  |
|   └─TableReader_22(Probe)     | 1500000.00 | root      |               | data:TableFullScan_21                                                           |
|     └─TableFullScan_21        | 1500000.00 | cop[tikv] | table:a       | keep order:false                                                                |
+-------------------------------+------------+-----------+---------------+---------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

In fact, since the N_NATIONKEY of the nation table is the primary key, it should directly use the primary key of the customer for the count. I hope this can also be optimized at the logical optimization level to eliminate the association.
| username: Raymond | Original post link

MySQL automatically rewrote it.

| username: Raymond | Original post link

This statement was not successfully rewritten by MySQL either.

| username: 人如其名 | Original post link

If someone else can do it, we can’t fall behind.

| username: 人如其名 | Original post link

Fixed after version 7.2, related ISSUES:

| username: yytest | Original post link

Try upgrading to the latest 7.5.1 LTS version.

| username: zhaokede | Original post link

You can only write the subquery yourself;
You can try upgrading the version in the test environment.

| username: 数据库真NB | Original post link

Quickly optimize the minor version, this shouldn’t be difficult.