TiDB SQL Execution Time is Unstable

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb SQL执行时间不稳定

| username: qiuxb

To improve efficiency, please provide the following information. Clear problem descriptions can be resolved faster:
[TiDB Usage Environment]
4.0.8

[Overview] Scenario + Problem Overview
A SQL query that has been running for a long time on the online TiDB, usually taking around 0.8s, occasionally takes up to 600s without finishing. CPU and memory usage are normal, and manual execution is still fast at 0.8s. The execution plan obtained manually shows no anomalies, and the specific reason cannot be analyzed at the moment.

Execution Plan
±---------------------------------------------------------------±--------±-------------±----------------------------------------------------------------------------------------------------------
| id | estRows | task | access object
±---------------------------------------------------------------±--------±-------------±----------------------------------------------------------------------------------------------------------
| Projection_23 | 1.00 | root |
| └─TopN_26 | 1.00 | root |
| └─HashAgg_32 | 1.00 | root |
| └─Projection_133 | 0.01 | root |
| └─HashJoin_40 | 0.01 | root |
| ├─HashJoin_42(Build) | 0.01 | root |
| │ ├─IndexJoin_50(Build) | 0.00 | root |
| │ │ ├─IndexJoin_70(Build) | 0.00 | root |
| │ │ │ ├─IndexJoin_84(Build) | 0.00 | root |
| │ │ │ │ ├─HashJoin_90(Build) | 0.00 | root |
| │ │ │ │ │ ├─IndexLookUp_100(Build) | 0.00 | root |
| │ │ │ │ │ │ ├─IndexRangeScan_97(Build) | 2.30 | cop[tikv] | table:crea, index:idx_adid(advertisement_id)
| │ │ │ │ │ │ └─Selection_99(Probe) | 0.00 | cop[tikv] |
| │ │ │ │ │ │ └─TableRowIDScan_98 | 2.30 | cop[tikv] | table:crea
| │ │ │ │ │ └─Point_Get_109(Probe) | 1.00 | root | table:t_ad_advertisement
| │ │ │ │ └─TableReader_83(Probe) | 1.00 | root |
| │ │ │ └─TableRangeScan_82 | 1.00 | cop[tikv] | table:gro
| │ │ │ └─TableReader_69(Probe) | 1.00 | root |
| │ │ │ └─TableRangeScan_68 | 1.00 | cop[tikv] | table:cam
| │ │ └─IndexLookUp_49(Probe) | 1.00 | root |
| │ │ ├─Selection_48(Build) | 1.00 | cop[tikv] |
| │ │ │ └─IndexRangeScan_46 | 1.00 | cop[tikv] | table:acc, index:uniq_uuid(uuid)
| │ │ └─TableRowIDScan_47(Probe) | 1.00 | cop[tikv] | table:acc
| │ └─TableReader_122(Probe) | 231.00 | root |
| │ └─TableRangeScan_121 | 231.00 | cop[tiflash] | table:norm
| └─IndexLookUp_132(Probe) | 443.21 | root |
| ├─IndexRangeScan_129(Build) | 443.66 | cop[tikv] | table:t_statistics_stats_hour, partition:p20220802, index:idx_advertiser_hour(advertiser_id, report_time)
| └─Selection_131(Probe) | 443.21 | cop[tikv] |
| └─TableRowIDScan_130 | 443.66 | cop[tikv] | table:t_statistics_stats_hour, partition:p20220802
±---------------------------------------------------------------±--------±-------------±----------------------------------------------------------------------------------------------------------
[Background] Operations performed

[Phenomenon] Business and database phenomenon
From TiDB monitoring, at the time of SQL execution, the IO utilization of TiFlash rose to 80%.

[Problem] Current problem encountered

[Business Impact]

[TiDB Version]
4.0.8
[Application Software and Version]

[Attachments] Relevant logs and configuration information

  • TiUP Cluster Display information
  • TiUP Cluster Edit config information

Monitoring (https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana monitoring
  • TiDB Grafana monitoring
  • TiKV Grafana monitoring
  • PD Grafana monitoring
  • Corresponding module logs (including logs 1 hour before and after the problem)

If the question is about performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy and paste the terminal output results for upload.

| username: qiuxb | Original post link

The current TiDB deployment architecture consists of 3 PD, 2 TiDB, and 1 TiFlash.

| username: TiDBer_jYQINSnf | Original post link

Is the execution plan for 600 seconds the same as the execution plan for 0.8 seconds?

| username: qiuxb | Original post link

Currently, it cannot be reproduced, so we don’t know what the execution plan for the 600s looks like. Additionally, would there be any issues if the query automatically uses TiFlash?

| username: ddhe9527 | Original post link

You can check the execution plan of this SQL in the dashboard or statement summary tables to see if there are multiple.

| username: qiuxb | Original post link

It seems that there is a bug in the dashboard. Last time I checked the SQL execution plan, one instance of TiDB encountered an OOM issue. Could the statement summary table also have this problem?

| username: h5n1 | Original post link

You can find slow SQL in the slowlog file or CLUSTER_SLOW_QUERY in TiDB, then use the tidb_decode_plan() function to parse the Plan_digest to get the execution plan.

| username: cheng | Original post link

select plan, plan_digest, digest_text from information_schema.statements_summary
– or statement_summary_history where digest=‘’

| username: buddyyuan | Original post link

This is expected. The dashboard is also based on these memory views. statement_summary and statement_summary_history are all memory views. Once the instance is restarted, the SQL information of the related instance is cleared.

You can only find slow SQL from the slow log and use tidb_decode_plan to convert it into an execution plan.

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.