SQL Query Instability in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb的sql查询不稳定

| username: lxzkenney

tidb 5.0.3
SQL:
desc SELECT
→ summary_create_time AS summary_create_time,
→ encrypt_mobile AS encrypt_mobile,
→ first_class_content AS first_class_content,
→ second_class_content AS second_class_content,
→ third_class_content AS third_class_content,
→ forth_class_content AS forth_class_content
→ FROM shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d
→ WHERE 1 = 1
→ AND encrypt_mobile = ‘xxxxxxxxxxx’
→ ORDER BY summary_create_time DESC
→ LIMIT 0, 150;
SQL query execution plan, health of statistics 100.

The SQL query is a simple query with an index, and the filtered data is in single digits. There is no slow query record of this SQL in the slow query log, but there are indeed around 60 queries per day with a response time of about 2 seconds, triggering alerts. No anomalies have been found in the program’s trace so far. From the database side, it is unclear whether there are lock waits or lock conflicts. How should this be investigated, and which metrics should be checked to determine if there is a read lock wait issue?
From the database side, the access volume is not large and there is no pressure. However, the table dwb_risk_cs_service_summary_full_d is read and written frequently, with a scale of 10 million.
There are indeed some write-write conflicts in the logs, but it is unclear if they affect read operations.

The CPU usage in the past two days has been very low, and the access volume is not large.

This chart is ERROR KEYS: The metrics don’t seem detailed enough to determine if there are read-write conflicts.

| username: 我是咖啡哥 | Original post link

Frequent DML operations can affect read efficiency. You can see a large number of versions in the SQL details.

| username: xingzhenxiang | Original post link

Try changing the index to encrypt_mobile+summary_create_time.

| username: 裤衩儿飞上天 | Original post link

Post the results of the explain analyze for us to take a look.

| username: lxzkenney | Original post link

explain_analyze.txt (5.5 KB)

mysql> mysql> explain analyze SELECT summary_create_time AS summary_create_time, encrypt_mobile AS encrypt_mobile, first_class_content AS first_class_content, second_class_content AS second_class_content, third_class_content AS thss_content, forth_class_content AS forth_class_content FROM shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d WHERE 1 = 1 AND encrypt_mobile = ‘QbZD73YfQvq22tytWDo4Cw==’ ORDER BY summary_create_time DESC LIMIT 0, 150;
±---------------------------------±--------±--------±----------±-------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------------------------------------------------------------------±----------±-----+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
±---------------------------------±--------±--------±----------±-------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------------------------------------------------------------------±----------±-----+
| TopN_10 | 2.00 | 0 | root | | time:517.9µs, loops:1 | shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d.summary_create_time:desc, offset:0, count:150 | 0 Bytes | N/A |
| └─IndexLookUp_28 | 2.00 | 0 | root | | time:509.6µs, loops:2, table_task: {total_time: 7.61ms, num: 0, concurrency: 16} | | 207 Bytes | N/A |
| ├─IndexRangeScan_26(Build) | 2.00 | 0 | cop[tikv] | table:dwb_risk_cs_service_summary_full_d, index:index_encrypt_mobile(encrypt_mobile) | time:456.1µs, loops:1, cop_task: {num: 1, max: 434.8µs, proc_keys: 0, rpc_num: 1, rpc_time: 420µs, copr_cache_hit_ratio: 0.00}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 0, total_keys: 1, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 11, read_count: 0, read_byte: 0 Bytes}}} | range:[“QbZD73YfQvq22tytWDo4Cw==”,“QbZD73YfQvq22tytWDo4Cw==”], keep order:false | N/A | N/A |
| └─TableRowIDScan_27(Probe) | 2.00 | 0 | cop[tikv] | table:dwb_risk_cs_service_summary_full_d | | keep order:false | N/A | N/A |
±---------------------------------±--------±--------±----------±-------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------------------------------------------------------------------±----------±-----+
4 rows in set (0.00 sec)

| username: 裤衩儿飞上天 | Original post link

  1. When the SQL execution is slow, you can check the DATA_LOCK_WAITS table, which will have the current lock wait information.
    You can refer to: TiDB Lock Conflict Troubleshooting | PingCAP Docs

  2. Specifically, this issue should be addressed from the business logic layer to see where the conflict is occurring.

| username: tidb菜鸟一只 | Original post link

The execution time of this SQL is obviously related to the value of encrypt_mobile. You should find the values of encrypt_mobile that correspond to a large amount of data and check the execution time of this SQL.

| username: xingzhenxiang | Original post link

Is it difficult to create a composite index? I think creating a composite index would solve the problem.

| username: 胡杨树旁 | Original post link

Is the data distribution uneven?

| username: lxzkenney | Original post link

The execution plan after adding feels slightly higher in time. One more step was added.
±-----------------------------------±--------±--------±----------±--------------------------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±----------±-----+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
±-----------------------------------±--------±--------±----------±--------------------------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±----------±-----+
| Limit_13 | 2.00 | 0 | root | | time:497.9µs, loops:1 | offset:0, count:150 | N/A | N/A |
| └─Projection_38 | 2.50 | 0 | root | | time:497.1µs, loops:1, Concurrency:OFF | shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d.summary_create_time, shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d.encrypt_mobile, shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d.first_class_content, shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d.second_class_content, shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d.third_class_content, shuidi_data_warehouse.dwb_risk_cs_service_summary_full_d.forth_class_content | 3.79 KB | N/A |
| └─IndexLookUp_37 | 2.50 | 0 | root | | time:496.3µs, loops:1, table_task: {total_time: 7.43ms, num: 0, concurrency: 16} | | 216 Bytes | N/A |
| ├─Limit_36(Build) | 2.50 | 0 | cop[tikv] | | time:445.2µs, loops:1, cop_task: {num: 1, max: 405µs, proc_keys: 0, rpc_num: 1, rpc_time: 387.6µs, copr_cache_hit_ratio: 0.00}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 0, total_keys: 1, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 16, read_count: 0, read_byte: 0 Bytes}}} | offset:0, count:150 | N/A | N/A |
| │ └─IndexRangeScan_34 | 2.50 | 0 | cop[tikv] | table:dwb_risk_cs_service_summary_full_d, index:index_encrypt_time(encrypt_mobile, summary_create_time) | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 0, total_keys: 0, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}} | range:[“QbZD73YfQvq22tytWDo4Cw==”,“QbZD73YfQvq22tytWDo4Cw==”], keep order:true, desc | N/A | N/A |
| └─TableRowIDScan_35(Probe) | 2.50 | 0 | cop[tikv] | table:dwb_risk_cs_service_summary_full_d | | keep order:false, stats:pseudo | N/A | N/A |
±-----------------------------------±--------±--------±----------±--------------------------------------------------------------------------------------------------------±------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------±----------±-----+
6 rows in set (0.00 sec)

| username: xingzhenxiang | Original post link

This… theoretically, filtering and sorting are both done on the index, so it should be faster.

| username: army | Original post link

Check if there are writes updating the same rows when the query is slow. It seems like a read-write conflict. You can look at the execution plan when it is slow to see if the resolveLock wait time is relatively long.

| username: lxzkenney | Original post link

I checked the monitoring at the corresponding time. Is this considered a read-write conflict?