TiDB 5.4 executing view remains in running state for several thousand seconds

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb5.4 执行视图,一直处于运行状态,时间达几千秒

| username: tidb126

TiDB 5.4
Memory alarm, checked long-running SQL, reached several thousand seconds. The query view, when executed separately, is very fast, but it also gets stuck when executed separately. How can I troubleshoot this? This situation happens quite often, eventually leading to memory alarms.

| username: tidb菜鸟一只 | Original post link

When querying the view, did you add conditions? Can you give an example?

| username: h5n1 | Original post link

Is there an issue with one view or all views? Are the views single-table or multi-table joins? Could you provide a slow query SQL, the view SQL, and its execution plan?

| username: tidb126 | Original post link

Query the view with the condition where 1=1 and order_date between '2023-07-01' and '2023-07-31' group by order_date order by 1 desc limit 10 offset 0.

| username: tidb126 | Original post link

Currently, there is such a situation with a view. The view performs a left join on multiple tables.

| username: tidb126 | Original post link

I saw this post tidb 5.4.0-sql卡住 - TiDB 的问答社区, it seems to be the same issue. I tested it according to indexHashJoin hang in handleTask · Issue #35638 · pingcap/tidb · GitHub, and after multiple executions, it indeed gets stuck.

| username: tidb菜鸟一只 | Original post link

Check if the execution plans are different when querying the view directly and using SQL?

| username: tidb126 | Original post link

The same SQL query view, instead of querying the base table directly with SQL, will get stuck if the query is executed multiple times with the same query statement.

Execution Plan:
*************************** 1. row ***************************
id: Projection_12
estRows: 8.39
task: root
access object:
operator info: xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.order_date, minus(Column#90, Column#91)->Column#104, minus(Column#92, Column#93)->Column#105, Column#94, Column#95, Column#92, Column#90, Column#96, Column#97, Column#93, ifnull(Column#91, 0)->Column#106, Column#98, Column#99, Column#100, Column#101, Column#102, Column#103
*************************** 2. row ***************************
id: └─TopN_15
estRows: 8.39
task: root
access object:
operator info: xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.order_date:desc, offset:0, count:10
*************************** 3. row ***************************
id: └─HashAgg_20
estRows: 8.39
task: root
access object:
operator info: group by:Column#130, funcs:sum(Column#115)->Column#90, funcs:sum(Column#116)->Column#91, funcs:count(distinct Column#117)->Column#92, funcs:count(distinct Column#118)->Column#93, funcs:count(distinct Column#119)->Column#94, funcs:count(distinct Column#120)->Column#95, funcs:count(distinct Column#121)->Column#96, funcs:count(distinct Column#122)->Column#97, funcs:count(distinct Column#123)->Column#98, funcs:count(distinct Column#124)->Column#99, funcs:sum(Column#125)->Column#100, funcs:sum(Column#126)->Column#101, funcs:sum(Column#127)->Column#102, funcs:sum(Column#128)->Column#103, funcs:firstrow(Column#129)->xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.order_date
*************************** 4. row ***************************
id: └─Projection_70
estRows: 386835.94
task: root
access object:
operator info: if(and(eq(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.flag, 0), not(isnull(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.right_refund_code))), 0, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.price)->Column#115, case(isnull(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.refund_price), 0, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.refund_price)->Column#116, if(eq(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.flag, 1), xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.voucher_code, )->Column#117, if(not(isnull(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.right_refund_code)), xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.voucher_code, )->Column#118, if(eq(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.flag, 1), xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.right_order_code, )->Column#119, if(eq(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.flag, 1), xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.member_no, )->Column#120, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.right_refund_code, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.refund_mobile, if(eq(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.reexchange_flag, 1), xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.member_no, )->Column#123, if(eq(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.reexchange_flag, 1), xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.voucher_code, )->Column#124, cast(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.reexchange_tickets, decimal(10,0) BINARY)->Column#125, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.reexchange_amount, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.bt_value, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.dk_value, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.order_date, xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.order_date
*************************** 5. row ***************************
id: └─HashJoin_31
estRows: 386835.94
task: root
access object:
operator info: left outer join, equal:[eq(xxx_xxxx_xxxxx.hhh_dy_group_purchase_detail.reexchange_inner_code, xxx_xxxx_xxxxx.dim_cinema_detail.cinema_inner_code)]
*************************** 6. row ***************************
id: ├─IndexReader_42(Build)
estRows: 1002.00
task: root
access object:
operator info: index:IndexFullScan_41
*************************** 7. row ***************************
id: │ └─IndexFullScan_41
estRows: 1002.00
task: cop[tikv]
access object: table:c, index:idx_dim_cinema_detail_inner_code(cinema_inner_code)
operator info: keep order:false, stats:pseudo
*************************** 8. row ***************************
id: └─IndexLookUp_38(Probe)
estRows: 309468.75
task: root
access object:
operator info:
*************************** 9. row ***************************
id: ├─IndexRangeScan_36(Build)
estRows: 309468.75
task: cop[tikv]
access object: table:a, index:idx_dm_spread_activity_detail(order_date)
operator info: range:[“2023-07-01”,“2023-07-31”], keep order:false, stats:pseudo
*************************** 10. row ***************************
id: └─TableRowIDScan_37(Probe)
estRows: 309468.75
task: cop[tikv]
access object: table:a
operator info: keep order:false, stats:pseudo

| username: tidb126 | Original post link

The specific SQL statement is as follows:

SELECT 
    order_date, 
    (SUM(IF(flag = '0' AND right_refund_code IS NOT NULL, 0, price)) - SUM(CASE WHEN refund_price IS NULL THEN 0 ELSE refund_price END)) AS sjsmxsje, 
    (IFNULL(COUNT(DISTINCT IF(flag = '1', voucher_code, NULL)), 0) - IFNULL(COUNT(DISTINCT IF(right_refund_code IS NOT NULL, voucher_code, NULL)), 0)) AS sjsmqzs, 
    (COUNT(DISTINCT IF(flag = '1', right_order_code, NULL))) AS right_order_num, 
    (COUNT(DISTINCT IF(flag = '1', member_no, NULL))) AS member_num, 
    (COUNT(DISTINCT IF(flag = '1', voucher_code, NULL))) AS sell_voucher_num, 
    (SUM(IF(flag = '0' AND right_refund_code IS NOT NULL, 0, price))) AS order_price, 
    (COUNT(DISTINCT right_refund_code)) AS right_refund_num, 
    (COUNT(DISTINCT refund_mobile)) AS refund_member_num, 
    (COUNT(DISTINCT IF(right_refund_code IS NOT NULL, voucher_code, NULL))) AS refund_voucher_num, 
    (IFNULL(SUM(CASE WHEN refund_price IS NULL THEN 0 ELSE refund_price END), 0)) AS efund_price, 
    (COUNT(DISTINCT IF(reexchange_flag = '1', member_no, NULL))) AS check_user_num, 
    (COUNT(DISTINCT IF(reexchange_flag = '1', voucher_code, NULL))) AS check_voucher_num, 
    SUM(reexchange_tickets) AS reexchange_tickets, 
    SUM(reexchange_amount) AS reexchange_amount, 
    SUM(bt_value) AS bt_value, 
    SUM(dk_value) AS dk_value 
FROM 
    hhh_dy_group_purchase_detail 
WHERE 
    1 = 1 
    AND order_date BETWEEN '2023-07-01' AND '2023-07-31' 
GROUP BY 
    order_date 
ORDER BY 
    1 DESC 
LIMIT 
    10 OFFSET 0;
| username: 有猫万事足 | Original post link

If it’s this bug, upgrading to 5.4.3 should include the fix.

| username: tidb126 | Original post link

Currently, only this view has this situation, other views also have base tables with left join queries.

| username: tidb126 | Original post link

I found that when I opened a session and executed this SQL, it returned the result after 4 seconds. However, in the other sessions I opened, this SQL has not finished executing and is still running. What’s going on?

| username: tidb菜鸟一只 | Original post link

You can use EXPLAIN ANALYZE to check the execution plan and see exactly where it is slow.

| username: tidb126 | Original post link

It returned data in 4 seconds, why do you still say it’s slow?

| username: tidb126 | Original post link

Is it useful to add an index if a view is a left join query of two tables without filter conditions, where one table has an index on the join field and the other does not, and the query involves a full table join?

| username: 有猫万事足 | Original post link

The estimated rows are only 386,835.94. Unless the statistics are inaccurate and the actual number of rows is much larger, 4 seconds is definitely a long time. There are related issues as well, so I strongly recommend making a backup and upgrading to see if it helps.

| username: Jellybean | Original post link

Could you please post a screenshot? The text formatting is quite difficult to read.

| username: zhanggame1 | Original post link

The current situation is that sometimes it’s fast and sometimes it gets stuck?

| username: redgame | Original post link

First, collect statistical information.

| username: tidb126 | Original post link

After the above SQL is executed on the client side and the data is returned, the SQL does not end. Other sessions can still see that this SQL is in a continuously running state.