Subquery SQL: Oracle returns results in 2 seconds, TiDB takes over 30 minutes without results

translator_bot · June 22, 2024, 9:40am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 子查询sql，oracle 2秒查出结果，tidb 30分钟没查出结果

| username: wenyi

This SQL statement takes 2 seconds to produce results in Oracle, but it is very slow in TiDB.

TiDB execution plan

translator_bot · June 22, 2024, 9:40am

| username: xfworld | Original post link

How large is the data volume of this table?
Can you share the structure?

translator_bot · June 22, 2024, 9:40am

| username: wenyi | Original post link

The main reason for the slow speed is that the SQL statement is not optimized. You can use the EXPLAIN command to analyze the execution plan of the SQL statement and see if there are any full table scans or other inefficient operations. Additionally, you can check if there are any issues with the indexes, such as missing indexes or indexes not being used.

translator_bot · June 22, 2024, 9:40am

| username: xfworld | Original post link

Well, the data volume is too small. Comparing it with MySQL might be more meaningful.

TiDB’s operator and plan optimization capabilities definitely can’t match Oracle’s, so without a certain data scale, no advantages can be demonstrated. There are ways to optimize it:

In the red box area, without operator support, it’s equivalent to aggregating to TiDB for partial processing, then completing a full table scan through TiFlash, and then aggregating again.

Aggregated twice… and there’s waiting in between…

translator_bot · June 22, 2024, 9:40am

| username: realcp1018 | Original post link

You can try the following methods:

Remove only_full_group_by from sql_mode, and then rewrite the SQL as follows:

SELECT
    max(id),
    patient_id,
    card_number 
FROM
    base.T_DCH_PATIENT_CARD 
WHERE
    card_type = 4 
    AND card_state = 1 
    AND hosp_id = 1517 
GROUP BY
    patient_id;

Under the default sql_mode, rewrite the SQL as follows:

SELECT
    t.id,
    t.patient_id,
    t.card_number 
FROM
    base.T_DCH_PATIENT_CARD t 
WHERE
    t.id IN (
    SELECT
        max(id) AS id 
    FROM
        base.T_DCH_PATIENT_CARD 
    WHERE
        card_type = 4 
        AND card_state = 1 
        AND hosp_id = 1517 
    GROUP BY
        patient_id 
    )

If you find that using TiFlash is slower, try disabling TiFlash first and then try again.

translator_bot · June 22, 2024, 9:40am

| username: tidb菜鸟一只 | Original post link

Your SQL query is definitely problematic when running on TiDB with TiFlash. I suggest using a hint to avoid TiFlash and try again. Additionally, you can rewrite the SQL as follows:

select patient_id, card_number from 
(
    select patient_id, card_number, rank() over (partition by patient_id order by id desc) as row_num
    from base.T_DCH_PATIENT_CARD ca
    where ca.card_type = 4
      and ca.card_state = 1
      and ca.hosp_id = 1517
) 
where row_num = 1;

translator_bot · June 22, 2024, 9:40am

| username: 人如其名 | Original post link

Can you post the execution plan for Oracle? It seems that TiDB hasn’t optimized the correlated subquery here. After filtering the ca table, there are still hundreds of thousands of results. Assuming the number of records to be de-correlated in the subquery is very small, optimizing away the subquery should significantly improve efficiency. However, it would be best to post the Oracle execution plan to see how it is executed.