Is there room for optimization in this SQL?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 这个sql还有优化空间吗

| username: TiDBer_7Q5CQdQd

Explain SELECT
company.NAME AS NAME,
company.reg_province AS province,
count( ‘’ ) AS count
FROM
standard_drafting_unit_relation
INNER JOIN company ON standard_drafting_unit_relation.drafting_unit_id = company.id
INNER JOIN standard ON standard_drafting_unit_relation.standard_id = standard.id
INNER JOIN ( SELECT company_industry_relation.company_id AS company_id FROM company_industry_relation WHERE company_industry_relation.industry_id IN ( ‘INB0101’ ) GROUP BY company_industry_relation.company_id ) AS a ON a.company_id = company.id
INNER JOIN (
SELECT
patent_proposer_relation.proposer_id AS company_id
FROM
patent_proposer_relation
INNER JOIN patent ON patent_proposer_relation.patent_id = patent.id
WHERE
patent.approval_date IS NOT NULL
AND patent.type = ‘Invention’
GROUP BY
patent_proposer_relation.proposer_id
) AS b ON b.company_id = company.id
WHERE
standard.category = ‘National Standard’
AND company.display = 1
AND company.status_std IN ( ‘Active (Operating, Open, In Business)’ )
AND company.establish_date <= ‘2023-12-24’
AND company.reg_province IN ( ‘Hebei Province’ )
GROUP BY
company.NAME,
company.reg_province
ORDER BY
count( '
’ ) DESC
LIMIT 30;

| username: forever | Original post link

Post the execution plan in the text, and include the complete operator info.

| username: Kongdom | Original post link

It is recommended to execute explain analyze to obtain the most accurate execution plan.

| username: wangccsy | Original post link

What kind of business is this, OP? Why don’t you optimize such a long SQL query?

| username: 小龙虾爱大龙虾 | Original post link

+1 :joy:

| username: andone | Original post link

Post the results of EXPLAIN ANALYZE and the table structure for review. Generally, check the statistics, indexes, and join methods.

| username: Kongdom | Original post link

Try changing this part to the where condition like this:
AND company.id IN (SELECT company_id FROM company_industry_relation WHERE industry_id IN ('INB0101'))

Handle this part in the same way:

INNER JOIN (
	SELECT patent_proposer_relation.proposer_id AS company_id FROM patent_proposer_relation
	INNER JOIN patent ON patent_proposer_relation.patent_id = patent.id
	WHERE patent.approval_date IS NOT NULL
	AND patent.type = '发明'
	GROUP BY patent_proposer_relation.proposer_id
) AS b ON b.company_id = company.id
| username: Kongdom | Original post link

Please provide the formatted SQL, it looks too uncomfortable without formatting~ OCD can’t stand it~

SELECT
    company.NAME AS NAME,
    company.reg_province AS province,
    count('') AS count
FROM standard_drafting_unit_relation
INNER JOIN company ON standard_drafting_unit_relation.drafting_unit_id = company.id
INNER JOIN standard ON standard_drafting_unit_relation.standard_id = standard.id
INNER JOIN (
    SELECT company_id 
    FROM company_industry_relation 
    WHERE industry_id IN ('INB0101') 
    GROUP BY company_id 
) AS a ON a.company_id = company.id
INNER JOIN (
    SELECT patent_proposer_relation.proposer_id AS company_id 
    FROM patent_proposer_relation
    INNER JOIN patent ON patent_proposer_relation.patent_id = patent.id
    WHERE patent.approval_date IS NOT NULL
    AND patent.type = 'invention'
    GROUP BY patent_proposer_relation.proposer_id
) AS b ON b.company_id = company.id
WHERE standard.category = 'national standard'
AND company.display = 1
AND company.status_std IN ('active (operating, open, in business)')
AND company.establish_date <= '2023-12-24'
AND company.reg_province IN ('Hebei Province')
GROUP BY company.NAME, company.reg_province
ORDER BY count('') DESC
LIMIT 30;
| username: TiDBer_7Q5CQdQd | Original post link

Originally, it was written as a subquery, and it was also very slow. Then, I took others’ advice and wrote it with this kind of join.

| username: 有猫万事足 | Original post link

It seems that not all tables have TiFlash replicas. Add TiFlash replicas to all tables and then try executing MPP. The improvement for this type of group by will be relatively significant.

| username: Kongdom | Original post link

Please post the table creation statement for us to take a look. If the amount of data in the sub-table is relatively small, using ‘in’ should be faster.

| username: 连连看db | Original post link

Is this SQL from a state-owned enterprise? :joy: We need to check the table structure of standard_drafting_unit_relation. It’s surprising that TiFlash is being used here.

| username: dba远航 | Original post link

I think the execution plan is okay.

| username: oceanzhang | Original post link

The person who wrote this sentence is also a genius. Why use so many subqueries?

| username: oceanzhang | Original post link

To optimize, you first need to rewrite the SQL. I also don’t understand why you use so many subqueries for the join.

| username: xingzhenxiang | Original post link

Removing subqueries might improve performance in some cases, but it depends on the specific context and database optimization.