Is there room for optimization if this SQL execution exceeds 2.5 minutes?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 这个sql执行超过了2.5分钟,还有优化空间吗

| username: TiDBer_7Q5CQdQd

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1

SELECT
    industry.NAME AS industry_name,
    count('*') AS count 
FROM
    company_f5
    INNER JOIN patent_proposer_relation ON patent_proposer_relation.proposer_id = company_f5.id
    INNER JOIN patent ON patent_proposer_relation.patent_id = patent.id
    INNER JOIN patent_industry_relation ON patent_proposer_relation.patent_id = patent_industry_relation.patent_id
    INNER JOIN industry ON industry.id = patent_industry_relation.industry_id 
WHERE
    patent.approval_date IS NOT NULL 
    AND patent.type = '发明' 
    AND industry.LEVEL = 0 
    AND industry.has_chain = '是' 
    AND industry.has_display = '展示' 
    AND company_f5.display = 1 
    AND company_f5.establish_date <= '2024-01-12' 
    AND company_f5.reg_province IN ('北京市') 
    AND (industry.industry_code LIKE 'INA%' OR industry.industry_code LIKE 'INT%') 
GROUP BY
    industry.NAME 
ORDER BY
    count('*') DESC 
    LIMIT 15
| username: Kongdom | Original post link

This is executing explain, right? Try executing explain analyze.

Are the tables company_f5, patent, and patent_proposer_relation not using indexes?

| username: forever | Original post link

The entire SQL query requires very few columns, and for some tables with fewer conditions, you can consider index coverage. Designing the indexes well should significantly improve the speed.

| username: linnana | Original post link

The “explain analyze” feature is supported starting from which version? It is only available in MySQL 8.0.

| username: Kongdom | Original post link

:flushed: Hasn’t TiDB always been available?

| username: linnana | Original post link

Oh, I didn’t notice that.

| username: 春风十里 | Original post link

It’s better to pull out the explain analyze and take a look. From the current estimated execution plan, some tables are not using indexes, possibly because they don’t have any. If convenient, you can also pull out the table creation statements for a look.

For example, the company_f5 table has three conditions, none of which are using indexes:
AND company_f5.display = 1
AND company_f5.establish_date <= ‘2024-01-12’
AND company_f5.reg_province IN (‘北京市’)

The company_f5 table is estimated to scan 234,960 rows, which is quite a lot of data. If the amount of returned data can be reduced through indexing, it should improve the speed.

| username: Kongdom | Original post link

I checked the documentation and confirmed that it has indeed been supported since the earliest version v2.1.

| username: 哈喽沃德 | Original post link

The condition patent.approval_date IS NOT NULL is fatal.

| username: Kongdom | Original post link

Could you share the table structure and indexes? It seems like the main issue is that the query is not using the indexes.

| username: 哈喽沃德 | Original post link

SELECT
    industry.NAME AS industry_name,
    COUNT('') AS count
FROM
    industry
    INNER JOIN patent_industry_relation ON industry.id = patent_industry_relation.industry_id
    INNER JOIN patent_proposer_relation ON patent_industry_relation.patent_id = patent_proposer_relation.patent_id
    INNER JOIN company_f5 ON patent_proposer_relation.proposer_id = company_f5.id
    INNER JOIN patent ON patent_proposer_relation.patent_id = patent.id
WHERE
    industry.LEVEL = 0
    AND industry.has_chain = '是'
    AND industry.has_display = '展示'
    AND patent.approval_date IS NOT NULL
    AND patent.type = '发明'
    AND company_f5.display = 1
    AND company_f5.reg_province = '北京市'
    AND (industry.industry_code LIKE 'INA%' OR industry.industry_code LIKE 'INT%')
    AND company_f5.establish_date <= '2024-01-12'
GROUP BY
    industry.NAME
ORDER BY
    count DESC
LIMIT 15;
Try this.
| username: linnana | Original post link

Wow :sunglasses:, how many features has TiDB implemented before MySQL?

| username: dba远航 | Original post link

You need to look at explain analyze.

| username: FutureDB | Original post link

Use EXPLAIN ANALYZE to check the actual execution plan of your SQL and see where the main slowdown occurs. Focus on optimizing those slow steps specifically. Also, check why the two large tables company_f5 and patent are not using indexes. Determine if setting appropriate indexes can avoid full table scans.

| username: 江湖故人 | Original post link

Are these fields good for filtering? If so, please add an index.

| username: Kongdom | Original post link

:yum: Looking forward to discovering more Easter eggs~

| username: 随缘天空 | Original post link

It indeed takes too long. Try modifying the table structure to add some indexes and perform SQL optimization testing.

| username: TiDBer_7Q5CQdQd | Original post link

The image is not available for translation. Please provide the text content for translation.

| username: TiDBer_7Q5CQdQd | Original post link

There is an index.

| username: tidb菜鸟一只 | Original post link

My suggestion is to put all the tables on TiFlash directly.