SQL Error [8118] [HY000]: Failed to build executor. Version tidb-v6.1.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: SQL 错误 [8118] [HY000]: Failed to build executor 。版本tidb-v6.1.0

| username: 特雷西-迈克-格雷迪

【TiDB Usage Environment】Production Environment
【TiDB Version】v6.1.0
【Encountered Problem】TiDB prompts: SQL Error [8118] [HY000]: Failed to build executor
【Reproduction Path】1) Contains sum in select 2) left join B on A.id=B.aid and A.num=2, the query SQL reports the above error
【Problem Phenomenon and Impact】TiDB prompts: SQL Error [8118] [HY000]: Failed to build executor

| username: db_user | Original post link

It feels like a bug: ERROR 8118 (HY000): Failed to build executor · Issue #27501 · pingcap/tidb · GitHub

| username: ShawnYan | Original post link

What specific statement was executed? grant xxxx?

| username: 特雷西-迈克-格雷迪 | Original post link

Just grant select on xxx.* to user@‘%’

| username: ShawnYan | Original post link

Was this executed in DBeaver? Which version of the MySQL connector is referenced?

| username: 特雷西-迈克-格雷迪 | Original post link

I get the same error when executing on the command line
~]# mysql -V
mysql Ver 8.0.20-11 for Linux on x86_64 (Percona Server (GPL), Release 11, Revision 159f0eb)

| username: ShawnYan | Original post link

The background information is still too little, could you please provide the minimal reproducible scenario statement?

| username: 特雷西-迈克-格雷迪 | Original post link

The default value of tidb_distsql_scan_concurrency is 15. If you have a large number of concurrent queries, you can try reducing this value to see if it helps.

| username: 特雷西-迈克-格雷迪 | Original post link

The bug is clear: after a left join, adding ‘and other_table.field’ and having sum() in the select statement will result in the error: [8118] [HY000]: Failed to build executor.

If it’s an inner join, there will be no error.

| username: Billmay表妹 | Original post link

I see the same error, it should be the BUG mentioned above ~ Surprisingly, the fix progress is visible on the GitHub link.

| username: 特雷西-迈克-格雷迪 | Original post link

I don’t know if it’s the same issue, but the scenario is different. There were no errors in version v5.4.0.

| username: onlyacat | Original post link

According to your suggestion, I constructed a simple case, but still couldn’t reproduce the issue. If possible, it would be best to provide the table creation statements and the shortest possible SQL.

| username: 特雷西-迈克-格雷迪 | Original post link

The following statement reports an error

select d.tid, smi.tid, b.tid,
SUM(CASE WHEN d.period < '2020-01' THEN d.pay ELSE 0 END) AS beforeYearAmt 
FROM testdbs.`m_info` smi
     INNER JOIN testdbs.`b2_info` d
           ON smi.code = d.code1 AND d.type1 = '03' AND smi.code3 = 'S21011'
     LEFT JOIN testdbs.`h_info` a ON smi.code4 = a.code AND smi.type3 = '01'
     LEFT JOIN testdbs.`b1_info` b ON a.btestcode = b.code
WHERE d.code2 = 'S21011'
      AND d.period <= '2021-12'
LIMIT 10
| username: 特雷西-迈克-格雷迪 | Original post link

I have already posted the steps to reproduce the issue in the reply above, please take a look.

| username: ShawnYan | Original post link

Check the current SQL mode settings?

| username: 特雷西-迈克-格雷迪 | Original post link

STRICT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_ENGINE_SUBSTITUTION

| username: ShawnYan | Original post link

The current phenomenon is:

  1. Using the default sql_mode setting
TiDB> select d.tid,smi.tid ,b.tid , SUM(CASE WHEN d.period < '2020-01' THEN d.pay ELSE 0 END) AS
beforeYearAmt FROM testdbs.m_info smi INNER JOIN testdbs.b2_info d ON smi.code = d.code1 AND d.type1 = 03 AND smi.code3 = 'S21011' LEFT JOIN testdbs.h_info a ON smi.code4 = a.code AND smi.type3 = 01 LEFT JOIN testdbs.b1_info b ON a.btestcode = b.code WHERE d.code2 = 'S21011' AND d.period <= '2021-12' LIMIT 10;
ERROR 8123 (HY000): In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'tid'; this is incompatible with sql_mode=only_full_group_by
  1. After modifying sql_mode
TiDB> set sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';

TiDB> select d.tid,smi.tid ,b.tid , SUM(CASE WHEN d.period < '2020-01' THEN d.pay ELSE 0 END) AS
beforeYearAmt FROM testdbs.m_info smi INNER JOIN testdbs.b2_info d ON smi.code = d.code1 AND d.type1 = 03 AND smi.code3 = 'S21011' LEFT JOIN testdbs.h_info a ON smi.code4 = a.code AND smi.type3 = 01 LEFT JOIN testdbs.b1_info b ON a.btestcode = b.code WHERE d.code2 = 'S21011' AND d.period <= '2021-12' LIMIT 10;
ERROR 8118 (HY000): Failed to build executor
  1. After modifying sql_mode and removing AND smi.type3 = '01'
TiDB> select d.tid,smi.tid ,b.tid , SUM(CASE WHEN d.period < '2020-01' THEN d.pay ELSE 0 END) AS
beforeYearAmt FROM testdbs.m_info smi INNER JOIN testdbs.b2_info d ON smi.code = d.code1 AND d.type1 = 03 AND smi.code3 = 'S21011' LEFT JOIN testdbs.h_info a ON smi.code4 = a.code  LEFT JOIN testdbs.b1_info b ON a.btestcode = b.code WHERE d.code2 = 'S21011' AND d.period <= '2021-12' LIMIT 10;
+------+------+------+---------------+
| tid  | tid  | tid  | beforeYearAmt |
+------+------+------+---------------+
| NULL | NULL | NULL |          NULL |
+------+------+------+---------------+
1 row in set (0.01 sec)
| username: 特雷西-迈克-格雷迪 | Original post link

Step 2: Isn’t the sql_mode you set the same as mine?

Step 3: Why remove AND smi.type3 = '01'?

| username: 特雷西-迈克-格雷迪 | Original post link

If you think the SQL is not standard, you can execute it like this:

SELECT d.code2, a.code, b.code,
           SUM(CASE WHEN d.period < '2020-01' THEN d.pay ELSE 0 END) AS beforeYearAmt
FROM testdbs.m_info smi 
INNER JOIN testdbs.b2_info d ON smi.code = d.code1 
            AND d.type1 = '03' AND smi.code3 = 'S21011'
LEFT JOIN testdbs.h_info a ON smi.code4 = a.code AND smi.type3 = '01'
LEFT JOIN testdbs.b1_info b ON a.btestcode = b.code
WHERE d.code2 = 'S21011'
              AND d.period <= '2021-12'
GROUP BY d.code2, a.code, b.code
LIMIT 10;
| username: onlyacat | Original post link

I took a quick look and found that it was because the master changed some calculation methods in the basePhysicalJoin within the PhysicalPlan. I’ll take another look in the next few days.