The same SQL has inconsistent execution plans in the production service and when manually run in the MySQL client

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 同一个sql,在生产服务里面和手动运行的Mysql客户端,执行计划不一致

| username: TiDBer_5nopBRTA

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path]

  1. The table information is as follows,
-- Data volume: 7.6 million
CREATE TABLE `mk_wx_all_clue` (
  `_id` varchar(72) NOT NULL,
  `company_id` varchar(24) NOT NULL,
  `country` varchar(24) DEFAULT NULL,
  `platform` varchar(8) DEFAULT NULL,
  `companyId` varchar(24) DEFAULT NULL,
  `secondScene` varchar(2) DEFAULT NULL,
  ... 
  PRIMARY KEY (`_id`,`company_id`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_company_id` (`company_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

-- Data volume: 9.5 million
CREATE TABLE `mk_wx_all_clue_userIds` (
  `_id` varchar(72) NOT NULL,
  `company_id` varchar(24) NOT NULL,
  `userIds_idx` bigint(20) NOT NULL,
  ...
  PRIMARY KEY (`_id`,`company_id`,`userIds_idx`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_company_id` (`company_id`),
  KEY `cid_uid` (`company_id`,`userIds`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

Table information:


Executed SQL:

explain analyze
SELECT count(1)
FROM mk_wx_all_clue a
         inner join mk_wx_all_clue_userIds d FORCE INDEX (cid_uid) on a.company_id = d.company_id and a._id = d._id
WHERE (d.company_id = 'W00000000001' and d.userIds in ('56', '951', '1239', '3009', '2158', '2693', '2256'))
  AND a.company_id = 'W00000000001'
  AND (a.clueStatus is null or a.clueStatus = 0 or a.clueStatus = 1)

[Encountered Issue: Problem Phenomenon and Impact]

The client is a Golang application, and the connection parameters are as follows:

?charset=utf8mb4&parseTime=True&loc=PRC

The slow SQL on the client side, queried through the dashboard, has the following execution plan:

Takes more than 3 seconds, showing no full table scan

Executing this SQL alone, the execution plan is:

Takes tens of milliseconds

Question:

Why does the same SQL have different execution plans in different environments? What could be the reason for this situation?


By the way, Golang uses prepared statements, and the parameters are all ?

| username: residentevil | Original post link

First, perform an analyze table operation on the table, and then check the execution plan.

| username: tidb菜鸟一只 | Original post link

Don’t take a screenshot of the above, provide a more complete text.

| username: forever | Original post link

Are the company_id values in your two SQL queries the same? When executed separately, it seems there are multiple values. Check the health of the table to see if there are any issues, and try collecting statistics before trying again.

| username: TiDBer_5nopBRTA | Original post link

I just tried it without preprocessing and found that the execution was very fast. Could it be that preprocessing is causing TiDB to be unable to accurately choose the optimal execution plan?

| username: TiDBer_5nopBRTA | Original post link

id estRows actRows task access object execution info operator info memory disk

HashAgg_11 1.00 1 root time:63.9ms, loops:2, partial_worker:{wall_time:63.868606ms, concurrency:5, task_num:1, tot_wait:317.978675ms, tot_exec:11.882µs, tot_time:317.994614ms, max:63.602127ms, p95:63.602127ms}, final_worker:{wall_time:0s, concurrency:5, task_num:1, tot_wait:318.060098ms, tot_exec:15.879µs, tot_time:318.078621ms, max:63.637674ms, p95:63.637674ms} funcs:count(1)->Column#49 4.21 KB N/A
└─IndexJoin_17 222.58 10 root time:63.7ms, loops:2, inner:{total:40.8ms, concurrency:5, task:1, construct:67.3µs, fetch:40.7ms, build:8.66µs}, probe:17.4µs inner join, inner:IndexLookUp_16, outer key:mongo_tx.mk_wx_all_clue_userids._id, inner key:mongo_tx.mk_wx_all_clue._id, equal cond:eq(mongo_tx.mk_wx_all_clue_userids._id, mongo_tx.mk_wx_all_clue._id) 115.8 KB N/A
├─IndexLookUp_31(Build) 222.58 48 root time:22.2ms, loops:3, index_task: {total_time: 8.9ms, fetch_handle: 8.9ms, build: 681ns, wait: 1.58µs}, table_task: {total_time: 13ms, num: 1, concurrency: 5} 6.41 KB N/A
│ ├─IndexRangeScan_29(Build) 222.58 48 cop[tikv] table:d, index:cid_uid(company_id, userIds) time:8.9ms, loops:3, cop_task: {num: 2, max: 7.24ms, min: 3.03ms, avg: 5.14ms, p95: 7.24ms, max_proc_keys: 39, p95_proc_keys: 39, rpc_num: 2, rpc_time: 10.3ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:0s, min:0s, p80:0s, p95:0s, iters:3, tasks:2}, scan_detail: {total_process_keys: 48, total_process_keys_size: 4789, total_keys: 55, rocksdb: {delete_skipped_count: 0, key_skipped_count: 48, block: {cache_hit_count: 93, read_count: 0, read_byte: 0 Bytes}}} range:[“W00000000001” “1239”,“W00000000001” “1239”], [“W00000000001” “2158”,“W00000000001” “2158”], [“W00000000001” “2256”,“W00000000001” “2256”], [“W00000000001” “2693”,“W00000000001” “2693”], [“W00000000001” “3009”,“W00000000001” “3009”], [“W00000000001” “56”,“W00000000001” “56”], [“W00000000001” “951”,“W00000000001” “951”], keep order:false N/A N/A
│ └─TableRowIDScan_30(Probe) 222.58 48 cop[tikv] table:d time:12.9ms, loops:2, cop_task: {num: 14, max: 6.68ms, min: 434.1µs, avg: 1.68ms, p95: 6.68ms, max_proc_keys: 7, p95_proc_keys: 7, tot_proc: 2ms, rpc_num: 14, rpc_time: 23.3ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1ms, min:0s, p80:0s, p95:1ms, iters:14, tasks:14}, scan_detail: {total_process_keys: 48, total_process_keys_size: 5509, total_keys: 50, rocksdb: {delete_skipped_count: 0, key_skipped_count: 4, block: {cache_hit_count: 478, read_count: 0, read_byte: 0 Bytes}}} keep order:false N/A N/A
└─IndexLookUp_16(Probe) 1.00 10 root time:40.6ms, loops:2, index_task: {total_time: 17.3ms, fetch_handle: 17.3ms, build: 662ns, wait: 1.85µs}, table_task: {total_time: 23.2ms, num: 1, concurrency: 5} 10.5 KB N/A
├─IndexRangeScan_13(Build) 1.00 40 cop[tikv] table:a, index:PRIMARY(_id, company_id) time:17.3ms, loops:3, cop_task: {num: 11, max: 7.22ms, min: 497.9µs, avg: 3ms, p95: 7.22ms, max_proc_keys: 6, p95_proc_keys: 6, tot_proc: 3ms, tot_wait: 2ms, rpc_num: 11, rpc_time: 32.9ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1ms, min:0s, p80:1ms, p95:1ms, iters:11, tasks:11}, scan_detail: {total_process_keys: 40, total_process_keys_size: 6720, total_keys: 80, rocksdb: {delete_skipped_count: 0, key_skipped_count: 40, block: {cache_hit_count: 548, read_count: 2, read_byte: 88.0 KB}}} range: decided by [eq(mongo_tx.mk_wx_all_clue._id, mongo_tx.mk_wx_all_clue_userids._id) eq(mongo_tx.mk_wx_all_clue.company_id, W00000000001)], keep order:false N/A N/A
└─Selection_15(Probe) 1.00 10 cop[tikv] time:23.1ms, loops:2, cop_task: {num: 18, max: 5.89ms, min: 556.4µs, avg: 2.43ms, p95: 5.89ms, max_proc_keys: 7, p95_proc_keys: 7, tot_proc: 3ms, tot_wait: 3ms, rpc_num: 18, rpc_time: 43.6ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1ms, min:0s, p80:0s, p95:1ms, iters:18, tasks:18}, scan_detail: {total_process_keys: 40, total_process_keys_size: 19774, total_keys: 40, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 635, read_count: 0, read_byte: 0 Bytes}}} or(isnull(mongo_tx.mk_wx_all_clue.cluestatus), or(eq(mongo_tx.mk_wx_all_clue.cluestatus, 0), eq(mongo_tx.mk_wx_all_clue.cluestatus, 1))) N/A N/A
└─TableRowIDScan_14 1.00 40 cop[tikv] table:a tikv_task:{proc max:1ms, min:0s, p80:0s, p95:1ms, iters:18, tasks:18} keep order:false N/A
| username: TiDBer_5nopBRTA | Original post link

It’s the same at the tenant level, tens of thousands of tenants.

| username: forever | Original post link

Check how many execution plans there are for this SQL, and also check if execution plan binding is used.

| username: 小龙虾爱大龙虾 | Original post link

You see, the a.company_id = d.company_id here and d.company_id = 'W00000000001' and a.company_id = 'W00000000001' are repeated. Writing this twice might cause some issues with the optimizer (related to execution plan caching). Try writing it once, and also, as you tested, don’t use the prepare interface for this SQL.

| username: zhanggame1 | Original post link

How about trying to analyze the same table on each TiDB node?

| username: zhang_2023 | Original post link

Collect statistics

| username: TiDBer_lBAxWjWQ | Original post link

Have the statistics been collected?

| username: tidb菜鸟一只 | Original post link

You need to provide a screenshot of the execution plan for another slow SQL query. This one should not be slow because it already uses index range scan, meaning it utilizes your filter conditions company_id and userIds. However, for the slow SQL query above, the cid_uid index is using index full scan, which shouldn’t happen. It might be because the userIds condition is not being used.

I suspect two issues:

  1. Your TiDB server has a Prepared Plan Cache for the corresponding SQL. The slow SQL query from your application is using the cached execution plan, which is the incorrect initial plan.
  2. The condition company_id appears twice: once in (d.company_id = 'W00000000001' and d.userIds in ('56', '951', '1239', '3009', '2158', '2693', '2256')) and again in a.company_id = d.company_id and a.company_id = 'W00000000001'. This causes the execution plan to sometimes default to a.company_id = d.company_id and a.company_id = 'W00000000001', preventing it from correctly using the cid_uid index’s index range scan and instead using index full scan.

I suggest removing the condition AND a.company_id = 'W00000000001'.

| username: dba远航 | Original post link

TiDB is a distributed execution operator, while MySQL is a single-node execution operator. It’s normal for them to be different.

| username: 哈喽沃德 | Original post link

Makes sense.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.