Why doesn't the OR condition use index merge?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: OR条件为何不走索引合并??

| username: 海石花47

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1
[Encountered Issue: Problem Phenomenon and Impact]

When using select .. from a join b on a.primary_key = b.unique_key where a.time >= xxx OR b.time >= xxx (both time fields are indexed), the query is very slow. After checking the explain plan, it seems that index merge is not being utilized.

The specific SQL is as follows:

SELECT
    a.uid,
    CASE
        WHEN a.reg_area IN (111, 0) THEN
            b.quyubh
        ELSE
            a.reg_area
    END reg_area,
    CASE
        WHEN location_area = 0
        AND REGIONCODE IS NOT NULL THEN
            REGIONCODE
        ELSE
            a.location_area
    END location_area
FROM
    `dwd`.`user_extend` a
    JOIN `dwd`.`user_first_location` b ON b.zhanghao = a.uid
WHERE
    a.update_time >= '2023-03-06 00:00:00' OR b.update_time >= '2023-03-06 00:00:00';

[Attachment: Screenshot/Logs/Monitoring]

| username: 海石花47 | Original post link

In the case of OR, the execution plan is as follows:

Projection_9  19376062.00  root    dwd.user_extend.uid, case(in(dwd.user_extend.reg_area, 912000000041844186, 0), dwd.user_first_location.quyubh, dwd.user_extend.reg_area)->Column#84, case(and(eq(dwd.user_extend.location_area, 0), not(isnull(dwd.user_first_location.regioncode))), dwd.user_first_location.regioncode, dwd.user_extend.location_area)->Column#85
└─Projection_10  19376062.00  root    dwd.user_extend.uid, dwd.user_extend.reg_area, dwd.user_extend.location_area, dwd.user_first_location.quyubh, dwd.user_first_location.regioncode
  └─IndexJoin_16  19376062.00  root    inner join, inner:TableReader_13, outer key:dwd.user_first_location.weimaihao, inner key:dwd.user_extend.uid, equal cond:eq(dwd.user_first_location.weimaihao, dwd.user_extend.uid), other cond:or(ge(dwd.user_extend.update_time, 2023-03-06 00:00:00.000000), ge(dwd.user_first_location.update_time, 2023-03-06 00:00:00.000000))
    ├─TableReader_45(Build)  19376062.00  root    data:Selection_44
    │ └─Selection_44  19376062.00  cop[tikv]    not(isnull(dwd.user_first_location.weimaihao))
    │   └─TableFullScan_43  19398524.00  cop[tikv]  table:b  keep order:false
    └─TableReader_13(Probe)  1.00  root    data:TableRangeScan_12
      └─TableRangeScan_12  1.00  cop[tikv]  table:a  range: decided by [dwd.user_first_location.weimaihao], keep order:false
| username: 海石花47 | Original post link

Other notes:

  1. Both sides of the OR condition are primary keys/unique keys, and update_time is indexed.
  2. If I split the OR condition and query a.update_time >= xx and b.update_time >= xx separately, it takes only a few seconds.
  3. Once the OR condition is added, it takes over a minute.
  4. Both tables have tens of millions of rows.
| username: tidb菜鸟一只 | Original post link

If you use the OR condition, it’s definitely impossible to use the index first. If the index is used first, what happens to the values that are filtered out and can be associated? So, it definitely scans the entire tables of both tables first, and then filters. The subsequent filtering definitely won’t use the index.

| username: 海石花47 | Original post link

So, is the prerequisite for index merge an OR query condition on a single table? It won’t perform index merge for two tables in a multi-table join, right?

| username: xfworld | Original post link

Change it to filter first, so the index will take effect. Then aggregate, it will be faster with a smaller range.

| username: tidb菜鸟一只 | Original post link

If you change it to “and,” you can filter it. After joining the two tables, the “or” condition filtering definitely won’t use index merging.

| username: Running | Original post link

Switch to using IN: WHERE a.update_time IN ('2023-03-06 00:00:00', '2023-03-06 00:00:00')

| username: 海石花47 | Original post link

Huh? I don’t understand…

| username: 海石花47 | Original post link

That definitely can use indexes. It seems that index merge cannot be used for join queries, right?

| username: 海石花47 | Original post link

I changed it a few times, but the results still seem incorrect… Can you please tell me how to fix it? I’m not very good at SQL :joy:

| username: 海石花47 | Original post link

Is it to directly split the OR and then write it as two SQL statements with a union added? It seems the result is correct, and each uses an index. Is this the correct way to modify it?

| username: 胡杨树旁 | Original post link

If both sides of the OR condition on the same table have indexes, it should be able to use index_merge. For multi-table joins, it probably won’t be applicable.

| username: buddyyuan | Original post link

You can try using index merge

| username: 海石花47 | Original post link

Tried it, didn’t work. Index merge should only be applicable to a single table.

| username: 海石花47 | Original post link

Yes, I tried it and it does seem that index merge is available for single tables.

| username: 海石花47 | Original post link

This works, tested and confirmed.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.