After a join query, adding an extra field results in an inexplicable scan of tens of millions of records

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 连表查询后.多增加一个字段都会导致一个莫名的几千万数据扫描

| username: johnwa-CD

【TiDB Usage Environment】Production Environment
【TiDB Version】5.0.6
【Encountered Problem】SQL Performance Issue
【Reproduction Path】Operations that led to the problem
【Problem Phenomenon and Impact】
Join operation, a very common statement

CREATE TABLE FbAdAsset (
Id bigint(20) NOT NULL AUTO_INCREMENT,
date_start datetime NOT NULL,
date_stop datetime NOT NULL,
AssetId varchar(128) NOT NULL,
FbAccount varchar(128) NOT NULL,
AdCampId varchar(128) NOT NULL,
AssetUrl varchar(512) NOT NULL,
AssetThumbUrl varchar(512) DEFAULT NULL,
AssetTitle varchar(512) NOT NULL,
Amount decimal(10,2) DEFAULT NULL,
PayNum int(11) NOT NULL,
Installs int(11) DEFAULT NULL,
Clicks int(11) DEFAULT NULL,
Impressions int(11) DEFAULT NULL,
Spend decimal(10,2) DEFAULT NULL,
Reach int(11) DEFAULT NULL,
Roas decimal(10,2) NOT NULL,
IsVideo int(11) NOT NULL,
UpdateTime datetime NOT NULL,
CreateTime datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (Id) /*T![clustered_index] NONCLUSTERED */,
KEY date_start (date_start,date_stop),
KEY assetid (AssetId),
KEY date_start_2 (date_start,AssetId),
KEY index_FbAccount (FbAccount),
KEY index_AssetTitle_IsVideo (AssetTitle,IsVideo)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=370170001

CREATE TABLE FbAccount (
Id int(11) NOT NULL AUTO_INCREMENT,
Account varchar(128) NOT NULL,
Secret varchar(128) DEFAULT NULL,
PageId varchar(128) DEFAULT NULL,
AppId varchar(128) DEFAULT NULL,
AppUrl varchar(500) DEFAULT NULL,
CreatTime datetime(3) NOT NULL,
ProductId int(11) NOT NULL,
ProductName varchar(128) NOT NULL,
Mt int(11) DEFAULT NULL,
Token varchar(500) DEFAULT NULL,
InsId varchar(500) DEFAULT NULL,
Status int(11) NOT NULL,
AutoFillAd int(11) NOT NULL,
UpdateStatus int(11) NOT NULL,
Chl varchar(128) DEFAULT NULL,
Core int(11) NOT NULL,
FbAdRuleId int(11) NOT NULL,
AdAutoActive int(11) NOT NULL,
StatusChangeTime datetime(3) NOT NULL,
FbAccountType int(11) NOT NULL,
RowVersion bigint(20) DEFAULT NULL,
SpendCap bigint(20) DEFAULT ‘0’,
AmountSpent bigint(20) DEFAULT ‘0’,
PutProductId int(11) NOT NULL DEFAULT ‘0’,
CurrentLanguage2 int(11) DEFAULT NULL,
AccountAdType int(11) NOT NULL DEFAULT ‘0’,
PRIMARY KEY (Id) /*T![clustered_index] CLUSTERED */,
UNIQUE KEY FbAccount_Account_IDX (Account)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=3810145

Query statement:
EXPLAIN

select ad.*,act.ProductId,act.Mt,act.Core,act.CurrentLanguage2 from FbAdAsset ad
INNER JOIN FbAccount act ON ad.FbAccount = act.Account
where date_start >= ‘2022-06-30’ AND date_start <= ‘2022-07-07’
AND Core = 1
AND ProductId = 3366 AND Mt = 1
limit 100

Execution plan

Limit_15 0.67 root offset:0, count:100
└─Projection_17 0.67 root db1.fbadasset.id, db1.fbadasset.date_start, db1.fbadasset.date_stop, db1.fbadasset.assetid, db1.fbadasset.fbaccount, db1.fbadasset.adcampid, db1.fbadasset.asseturl, db1.fbadasset.assetthumburl, db1.fbadasset.assettitle, db1.fbadasset.amount, db1.fbadasset.paynum, db1.fbadasset.installs, db1.fbadasset.clicks, db1.fbadasset.impressions, db1.fbadasset.spend, db1.fbadasset.reach, db1.fbadasset.roas, db1.fbadasset.isvideo, db1.fbadasset.updatetime, db1.fbadasset.createtime, db1.fbaccount.productid, db1.fbaccount.mt, db1.fbaccount.core, db1.fbaccount.currentlanguage2
└─IndexHashJoin_26 0.67 root inner join, inner:IndexLookUp_23, outer key:db1.fbaccount.account, inner key:db1.fbadasset.fbaccount, equal cond:eq(db1.fbaccount.account, db1.fbadasset.fbaccount)
├─TableReader_73(Build) 0.00 root data:Selection_72
│ └─Selection_72 0.00 cop[tiflash] eq(db1.fbaccount.core, 1), eq(db1.fbaccount.mt, 1), eq(db1.fbaccount.productid, 3366)
│ └─TableFullScan_71 958.00 cop[tiflash] table:act keep order:false, stats:pseudo
└─IndexLookUp_23(Probe) 701967.61 root
├─IndexRangeScan_20(Build) 37026963.56 cop[tikv] table:ad, index:index_FbAccount(FbAccount) range: decided by [eq(db1.fbadasset.fbaccount, db1.fbaccount.account)], keep order:false
└─Selection_22(Probe) 701967.61 cop[tikv] ge(db1.fbadasset.date_start, 2022-06-30 00:00:00.000000), le(db1.fbadasset.date_start, 2022-07-07 00:00:00.000000)
└─TableRowIDScan_21 37026963.56 cop[tikv] table:ad keep order:false

However, when using only one condition, there is no range: decided by [eq(db1.fbadasset.fbaccount, db1.fbaccount.account)], keep order:false. You can see that estRows returns more than 30 million rows.

------Statement:
EXPLAIN

select ad.*,act.ProductId,act.Mt,act.Core,act.CurrentLanguage2 from FbAdAsset ad
INNER JOIN FbAccount act ON ad.FbAccount = act.Account
where date_start >= ‘2022-06-30’ AND date_start <= ‘2022-07-07’
AND Core = 1

Execution plan:

Projection_10 672484.97 root db1.fbadasset.id, db1.fbadasset.date_start, db1.fbadasset.date_stop, db1.fbadasset.assetid, db1.fbadasset.fbaccount, db1.fbadasset.adcampid, db1.fbadasset.asseturl, db1.fbadasset.assetthumburl, db1.fbadasset.assettitle, db1.fbadasset.amount, db1.fbadasset.paynum, db1.fbadasset.installs, db1.fbadasset.clicks, db1.fbadasset.impressions, db1.fbadasset.spend, db1.fbadasset.reach, db1.fbadasset.roas, db1.fbadasset.isvideo, db1.fbadasset.updatetime, db1.fbadasset.createtime, db1.fbaccount.productid, db1.fbaccount.mt, db1.fbaccount.core, db1.fbaccount.currentlanguage2
└─HashJoin_42 672484.97 root inner join, equal:[eq(db1.fbaccount.account, db1.fbadasset.fbaccount)]
├─TableReader_66(Build) 0.96 root data:Selection_65
│ └─Selection_65 0.96 cop[tiflash] eq(db1.fbaccount.core, 1)
│ └─TableFullScan_64 958.00 cop[tiflash] table:act keep order:false, stats:pseudo
└─IndexLookUp_78(Probe) 3500236.02 root
├─IndexRangeScan_76(Build) 3500236.02 cop[tikv] table:ad, index:date_start_2(date_start, AssetId) range:[2022-06-30 00:00:00,2022-07-07 00:00:00], keep order:false
└─TableRowIDScan_77(Probe) 3500236.02 cop[tikv] table:ad keep order:false

It feels like just a regular join table operation query. How could it suddenly become like this, unable to query
FbAdAsset has a total of about 170 million records, FbAccount has about 958 rows
【Attachment】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: ddhe9527 | Original post link

The JOIN methods for the act table and ad table are different in the two SQL statements. The first SQL (the one with limit 100) uses IndexHashJoin, and the ad table uses the index index_FbAccount on the associated field FbAccount. The second SQL uses HashJoin, and the ad table uses the index date_start_2 on the date_start field. It seems that the filtering capability of the index index_FbAccount is very poor, resulting in a situation where estRows is very large. You can try adding a hint to the first SQL:

/*+USE_INDEX(ad date_start_2)*/
| username: johnwa-CD | Original post link

Thank you very much, it indeed solved the problem. What I’m a bit confused about is that I just added a query condition. It seems that the index usage did not execute as expected. One condition is Mt = 1, and the other is Mt = 1 AND ProductId = 3366.

These two conditions will use different indexes, which confuses me. The Limit above is not the key; I just missed adding it when I was capturing it.

| username: ddhe9527 | Original post link

Why do I feel like it has something to do with LIMIT? :joy: You can test the second SQL by adding LIMIT 100 and see which JOIN method the optimizer chooses. With LIMIT, the CBO optimizer should be more inclined to use IndexJoin or IndexHashJoin rather than HashJoin. This also leads to differences in index selection for the driven table because, in HashJoin, indexes are not used on the join fields for both tables.

| username: johnwa-CD | Original post link

Sorry, I can’t translate images. Please provide the text you need translated.

| username: johnwa-CD | Original post link

Teacher, in the screenshots of the two query statements above, I don’t use limit. The difference is whether the condition after where is 1 or >1.

| username: johnwa-CD | Original post link

Additionally, I have another question. Why doesn’t this query use TiFlash? I tried adding various parameters, but how can I make it use TiFlash for the query? I have also added FbAdAsset to TiFlash.

set @@session.tidb_allow_batch_cop=1;
set @@session.tidb_opt_broadcast_join=1;
Adding these didn’t help. I am using version 5.0.6.

| username: johnwa-CD | Original post link

Sorry, I can’t translate images. Please provide the text you need translated.

| username: johnwa-CD | Original post link

I don’t quite understand the timing for using TiFlash.

| username: ddhe9527 | Original post link

From the diagram, it seems that the optimizer believes that after adding these two conditions, the size of the act table is already small enough, with TableReader72 being almost zero. Therefore, using the act table as the driving table for NL will be very fast, causing the driven table to use the index on the associated field.

| username: johnwa-CD | Original post link

It somewhat overturns my understanding of databases.
Because I mostly did this kind of work on SQLServer before, and after migrating to TiDB, the behavior of some statements is really hard to understand.
In SQLServer, no matter what, the optimizer should not choose the wrong index.
Because this kind of query statement is really too common.
Large table joins with a small dictionary table, then multiple condition filters.

| username: ddhe9527 | Original post link

There are no indexes on TiFlash (excluding the Min-Max index at the Pack level within the Delta Tree), so if the optimizer decides to perform a full table scan and a TiFlash replica is available, it might choose to use it. If the optimizer decides to use an index, it will definitely use TiKV. The biggest difference between your two SQL queries is the difference in the time range, and the limit also has some impact.

SQL Server also sometimes uses the wrong index, and so does Oracle. TiDB’s optimizer is still being refined.

| username: johnwa-CD | Original post link

Yes. Still, thank you for the teacher’s help in solving the above problem. This means, in other words, forcing the query to use the specified index, right?

| username: ddhe9527 | Original post link

Yes, you can use hints to make SQL use a specified index. There are many other hints as well. You can refer to the following documentation:

| username: cs58_dba | Original post link

I recently learned about PCTP, and the content in the video can be referenced:

  • Plan 1: Timely collection of statistical information
    • Consider using analyze table to manually collect statistical information, or combine it with a cron job.
    • Adjust the tidb_auto_analyze_ratio, tidb_auto_analyze_start_time, and tidb_auto_analyze_end_time parameters to increase the frequency of collection and expand the collection window time.
  • Plan 2: Change the execution plan
    • Use hints or use index statements to solidify the execution plan.
    • Use SQL hints to change the execution plan.
| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.