SQL is consistently slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sql不间断的慢

| username: jboracle1981

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.1
[Reproduction Path] Business interface call, the same SQL query sometimes runs fast and sometimes slow
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]






A id task estRows operator info actRows execution info memory disk
TableReader_68 root 1 MppVersion: 1, data:ExchangeSender_67 61 time:28.9s, loops:2, cop_task: {num: 3, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00} 4.76 KB N/A
└─ExchangeSender_67 cop[tiflash] 1 ExchangeType: PassThrough 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─Projection_12 cop[tiflash] 1 round(Column#10, 2)->Column#11, substring(cast(energy.ads_iot_current_value_minute.data_time, var_string(19)), 1, 16)->Column#12, energy.ads_iot_current_value_minute.measure_tag 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─Projection_62 cop[tiflash] 1 Column#10, energy.ads_iot_current_value_minute.measure_tag, energy.ads_iot_current_value_minute.data_time 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─HashAgg_60 cop[tiflash] 1 group by:Column#30, Column#31, funcs:sum(Column#27)->Column#10, funcs:firstrow(Column#28)->energy.ads_iot_current_value_minute.measure_tag, funcs:firstrow(Column#29)->energy.ads_iot_current_value_minute.data_time, stream_count: 32 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─Projection_69 cop[tiflash] 17.90 energy.ads_iot_current_value_minute.data_value, energy.ads_iot_current_value_minute.measure_tag, energy.ads_iot_current_value_minute.data_time, substring(cast(energy.ads_iot_current_value_minute.data_time, var_string(19)), 1, 16)->Column#30, energy.ads_iot_current_value_minute.measure_tag, stream_count: 32 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─Selection_37 cop[tiflash] 17.90 eq(Column#9, 1), stream_count: 32 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─Window_39 cop[tiflash] 22.38 row_number()->Column#9 over(partition by energy.ads_iot_current_value_minute.measure_point_id, energy.ads_iot_current_value_minute.measure_tag, energy.ads_iot_current_value_minute.data_time order by energy.ads_iot_current_value_minute.create_time desc rows between current row and current row), stream_count: 32 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─Sort_44 cop[tiflash] 22.38 energy.ads_iot_current_value_minute.measure_point_id, energy.ads_iot_current_value_minute.measure_tag, energy.ads_iot_current_value_minute.data_time, energy.ads_iot_current_value_minute.create_time:desc, stream_count: 32 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─ExchangeReceiver_43 cop[tiflash] 22.38 stream_count: 32 61 tiflash_task:{proc max:28s, min:28s, avg: 28s, p80:28s, p95:28s, iters:1, tasks:2, threads:64} N/A N/A
└─ExchangeSender_42 cop[tiflash] 22.38 ExchangeType: HashPartition, Compression: FAST, Hash Cols: [name: energy.ads_iot_current_value_minute.measure_tag, collate: utf8mb4_bin], stream_count: 32 61 tiflash_task:{proc max:28.7s, min:0s, avg: 14.4s, p80:28.7s, p95:28.7s, iters:61, tasks:2, threads:77} N/A N/A
└─Selection_41 cop[tiflash] 22.38 eq(energy.ads_iot_current_value_minute.measure_tag, GEN.P), ge(energy.ads_iot_current_value_minute.data_time, 2024-05-24 00:00:00.000000), ge(energy.ads_iot_current_value_minute.data_value, 0), le(energy.ads_iot_current_value_minute.data_time, 2024-05-24 23:59:59.000000) 61 tiflash_task:{proc max:28.7s, min:0s, avg: 14.4s, p80:28.7s, p95:28.7s, iters:61, tasks:2, threads:77} N/A N/A
└─TableFullScan_40 cop[tiflash] 989009.75 table:ads, pushed down filter:eq(energy.ads_iot_current_value_minute.measure_point_id, be06c45012f711ed9eb91c8cf913d9a6), keep order:false, PartitionTableScan:true 3233 tiflash_task:{proc max:28.7s, min:0s, avg: 14.4s, p80:28.7s, p95:28.7s, iters:61, tasks:2, threads:77}, tiflash_scan:{dtfile:{total_scanned_packs:3490, total_skipped_packs:10830, total_scanned_rows:28492955, total_skipped_rows:88357511, total_rs_index_load_time: 0ms, total_read_time: 1373ms}, total_create_snapshot_time: 0ms, total_local_region_num: 0, total_remote_region_num: 44} N/A N/A
| username: jboracle1981 | Original post link

The one at the bottom is the SQL execution plan.

| username: jboracle1981 | Original post link

Experts, please take a look.

| username: jboracle1981 | Original post link

When it’s fast, it’s within milliseconds. When it’s slow, it’s nearly 30 seconds.

| username: jboracle1981 | Original post link

Is there anyone from the original manufacturer? Please help take a look. This issue has occurred several times in the production environment.

| username: xfworld | Original post link

Use the dashboard’s TOP to check the slow SQL…

It’s best to provide more reference information, such as structure, data scale, data processing methods, and other relevant information.

| username: jboracle1981 | Original post link

Okay, okay.

| username: jboracle1981 | Original post link

Table size 1.5T

| username: jboracle1981 | Original post link

This SQL, which is forced to use TiFlash, is generally quite fast. However, when the business interface is called, it can suddenly become very slow.

| username: xfworld | Original post link

Is this SQL always forcing the tiflash hint?

| username: jboracle1981 | Original post link

Sorry, I cannot translate images. Please provide the text content for translation.

| username: jboracle1981 | Original post link

Yes.

| username: xfworld | Original post link

The key point is that I didn’t see any information about the difference in speed. Is there a significant difference for the same SQL? How many seconds is the difference?

| username: jboracle1981 | Original post link

When it’s fast, it’s basically around 1 second.

| username: yytest | Original post link

Optimize SQL Queries:
Based on the analysis results, optimize SQL queries by adding necessary indexes, restructuring query logic, etc.

Adjust System Configuration:
Adjust the configuration parameters of TiDB and related components according to resource usage to reduce resource contention and improve overall performance.

Hardware Upgrade or Expansion:
If the analysis finds that insufficient hardware resources are the main cause of performance issues, consider upgrading the hardware or adding nodes to distribute the load.

| username: jboracle1981 | Original post link

Sure, could you guide me on how to specifically locate the cause of this issue?

| username: Kongdom | Original post link

Can you find a fast execution plan and a slow execution plan, and post them for comparison?

| username: zhanggame1 | Original post link

The slow query and the fast execution plan might be different, you should find and compare them. Additionally, there’s the issue of data caching; querying in memory and querying on the hard drive naturally have different speeds.

| username: jboracle1981 | Original post link

On Monday, I found that I couldn’t connect to the client’s production from home. In the planned tasks, one is cop[tiflash] and the other is mpp[tiflash]. I haven’t looked at the others in detail yet.

| username: jboracle1981 | Original post link

Okay, okay.