TiDB Coprocessor Execution Time is Very Long

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB Coprocessor 执行耗时 很长

| username: panqiao

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Problem Phenomenon and Impact】In the production environment, many slow queries can be seen through TiDB Dashboard, many of which have very long Coprocessor execution times. Whenever there is an SQL with a long Coprocessor execution time, the corresponding CPU of the TiKV node gets very high. How can this be optimized? Why does the Coprocessor execution take so long?
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】

Almost every slow SQL has this step taking a long time. How can it be optimized?

| username: zhaokede | Original post link

I have encountered this issue before and found the following solution online:

In TiDB, long Coprocessor execution times can be caused by various factors. Here are some possible reasons, corresponding analyses, and suggestions for resolution or optimization:

1. Index Issues

  • Problem Description: The query does not effectively utilize indexes, causing the Coprocessor to scan a large amount of data.
  • Solution:
    1. Check the query to ensure it uses appropriate indexes.
    2. Consider adding indexes for frequently queried columns.
    3. Use the EXPLAIN statement to analyze the query plan and see if indexes are being used correctly.

2. Data Volume Issues

  • Problem Description: The query involves a large volume of data, causing the Coprocessor to process more data.
  • Solution:
    1. Optimize the query to reduce unnecessary data scans.
    2. Consider using pagination to reduce the amount of data per query.
    3. If possible, partition or shard the data to improve query performance.

3. Coprocessor Concurrency Settings

  • Problem Description: Improper concurrency settings for the Coprocessor may cause some queries to wait too long.
  • Solution:
    1. Adjust TiDB configuration parameters such as tidb_distsql_scan_concurrency and tidb_index_lookup_concurrency to optimize query performance.
    2. Configure concurrency settings based on system load and hardware resources.

4. System Resource Bottlenecks

  • Problem Description: Insufficient CPU, memory, or disk I/O resources cause slow Coprocessor execution.
  • Solution:
    1. Monitor system resource usage to ensure TiDB has enough resources to run.
    2. Upgrade hardware resources, such as increasing CPU cores, expanding memory capacity, or using high-performance disks.
    3. Use high-performance storage devices like SSDs to improve disk I/O performance.

5. Query Optimization

  • Problem Description: The query itself has issues, such as complex subqueries or JOIN operations.
  • Solution:
    1. Simplify the query to avoid complex subqueries and JOIN operations.
    2. Use the EXPLAIN ANALYZE statement to analyze query performance, identify bottlenecks, and optimize them.
    3. Consider using TiDB’s SQL optimizer or other tools to optimize the query.

6. Version and Configuration Issues

  • Problem Description: An outdated TiDB version or improper configuration may cause performance issues.
  • Solution:
    1. Upgrade to the latest version of TiDB for better performance and stability.
    2. Check TiDB configuration files (such as tidb.toml and tikv.toml) to ensure they are correctly configured and match the actual hardware resources.

7. Cluster Status

  • Problem Description: The status of the TiDB cluster may affect Coprocessor execution performance.
  • Solution:
    1. Regularly check the status of the TiDB cluster to ensure all nodes are in normal condition.
    2. Use TiDB monitoring tools (such as Grafana) to monitor cluster performance metrics like CPU usage, memory usage, and disk I/O.
    3. Optimize and adjust based on monitoring results.

Summary

Long Coprocessor execution times can be caused by various factors, including index issues, data volume issues, concurrency settings, system resource bottlenecks, query optimization, version and configuration issues, and cluster status. To resolve this issue, it is necessary to troubleshoot and optimize based on the actual situation.

| username: 有猫万事足 | Original post link

It is best to provide the execution plan. You can’t just rely on guessing.
From experience, this is generally because the index is not used, resulting in a full table scan.

| username: panqiao | Original post link

Please help take a look.

| username: panqiao | Original post link

I have provided more detailed information below. Could you help analyze the specific possible reasons?

| username: zhaokede | Original post link

Are there really that many GC versions?

| username: 小龙虾爱大龙虾 | Original post link

Just create a composite index.

| username: tidb菜鸟一只 | Original post link

Use EXPLAIN ANALYZE to check the actual execution plan and see where the slowdown occurs.

| username: panqiao | Original post link

This is indeed a bit strange.

| username: panqiao | Original post link

I’ll give it a try.

| username: panqiao | Original post link

It seems to be the same information as above, but this time the SQL is not as exaggerated, though it still takes 5 seconds.

| username: lemonade010 | Original post link

The coprocessor takes a long time because it scans TiKV, and the index also uses item_type_id. Creating an ID and item_type_id index would improve it a lot.

| username: lemonade010 | Original post link

Is your statistical information not being collected properly? Why is there such a big difference? You should also do a statistical analysis.

| username: panqiao | Original post link

Where did you see that my statistics have significant differences? And theoretically, shouldn’t the statistics be automatically collected by TiDB at regular intervals? Is the information collected using the ANALYZE TABLE command? If that’s the case, our system has 13,000 tables.

| username: h5n1 | Original post link

It seems that the chosen index is not very efficient. Let’s take a look at the table structure and the entire composite index.

| username: lemonade010 | Original post link

Estimated 10 rows, but actually there are more than 700,000 rows.

| username: tidb菜鸟一只 | Original post link

Your statistics are definitely inaccurate.
image
Estimated 10 rows, actual 700,000, it definitely won’t be fast…

| username: zhaokede | Original post link

Is the data in this database newly imported?

| username: panqiao | Original post link

No, but sometimes it may grow relatively quickly.

| username: panqiao | Original post link

So how do we solve it?