Issue of Predicate Conditions Not Being Pushed Down in Join Queries

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关联查询中谓词条件无法下推的问题

| username: yeminhua

When performing a join query, the predicate cannot be pushed down to the view v_dwa_loss_ymh. The base table of v_dwa_loss_ymh is dwa_loss_act_17_d_daily, which has an index on etl_cycle. Without predicate pushdown, it results in a full table scan.

If you query the view directly or add the etl_cycle condition directly to the view during the join, the predicate can be pushed down to use the index.

| username: forever | Original post link

How about trying with BETWEEN AND?

| username: h5n1 | Original post link

The first execution plan evaluation went to TiFlash. Try forcing it to use TiKV to see if it uses the index, by adding the hint /*+ READ_FROM_STORAGE( TIKV[t2_name [, tl_name …]]) */.

| username: yeminhua | Original post link

It’s useless, it still performs a full table scan on TiKV.

| username: yeminhua | Original post link

Still the same.

| username: tidb菜鸟一只 | Original post link

No, the etl_cycle field in the df view currently has an index, and then it is inner joined with t1. After that, the filter condition is on the etl_cycle field of the t1 table, right? Try changing the filter condition to df.etl_cycle >= 20231031 and df.etl_cycle <= 20231031…

| username: zhanggame1 | Original post link

Try writing the full SQL,
Since t1.etl_cycle = df.etl_cycle, and the where clause already checks t1.etl_cycle, you can also add
df.etl_cycle >= and <= conditions.

| username: yeminhua | Original post link

Writing it all out is possible. My third picture shows it written out completely, but theoretically, it shouldn’t be necessary. It should be passed along. If we need to change the SQL, wouldn’t that mean a large number of related SQLs might also need to be changed?

| username: yeminhua | Original post link

If changed to df.etl_cycle >= 20231031 and df.etl_cycle <= 20231031, the view uses the index, but the t1 table stops using partition pruning and scans all partitions instead.


It seems that only when both df and t1 specify the exact value of etl_cycle will it work properly.

| username: h5n1 | Original post link

What is the value of the tidb_partition_prune_mod variable? If it’s static pruning, it won’t work.

| username: tidb菜鸟一只 | Original post link

Theoretically, predicates can be pushed down in an inner join, but not in an outer join… Can you share the table creation statement for the view?

| username: yeminhua | Original post link

The default is dynamic, hasn’t been changed.

| username: yeminhua | Original post link

CREATE VIEW dw.v_dwa_loss_ymh
(etl_cycle, cycle, prod_inst_id, cust_id, loss_type) 
AS
 SELECT t.etl_cycle AS etl_cycle,
        t.cycle AS cycle,
        t.prod_inst_id AS prod_inst_id,
        t.cust_id AS cust_id,
        t.dtv_user_type AS loss_type 
 FROM dw.dwa_loss_act_17_d_daily AS t WHERE t.dtv_user_type IN (1008,1009,1012,1011,1013,1010)
| username: zhanggame1 | Original post link

SQL does not pass filter conditions.

| username: yeminhua | Original post link

I tried again, and forcing the hint index or specifying the exact value of df.etlcycle allows the index to be used, but just specifying tikv does not use the index. Why does directly writing the value of df.etl_cycle allow the index to be used, and how is it different from passing it from t1?

| username: yeminhua | Original post link

Impossible, right?

| username: forever | Original post link

Check the character set and collation of this field in both tables.

| username: yeminhua | Original post link

The difference between the two tables’ etl_cycle is that t1.etl_cycle is int(11), while df.etl_cycle is bigint(20).

| username: yeminhua | Original post link

When I forced the use of the index at the same time, without specifying the exact value of df.etl_cycle, the predicate was not pushed down and it took 1 minute. When I specified the exact value of df.etl_cycle, the predicate was pushed down and it only took 10 seconds, which is a huge difference in efficiency. Could it really be as you said, that the SQL does not pass the filter condition of the associated field?

| username: tidb菜鸟一只 | Original post link

The inner join predicate can be pushed down. What is the index on the dwa_loss_act_17_d_daily table like? Is there a separate index for the etl_cycle field?