V6.5.1 delete limit is slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: v6.5.1 delete limit 删除慢

| username: 是我的海

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.1
The cluster was upgraded from 5.4.0 to 6.5.1, and all business delete statements became slow and stuck, unable to execute.
The SQL executed is as follows:

  `ocr_content_version` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'OCR data version 0 old version 1 unified version for three ends',
  `is_del` tinyint(4) unsigned NOT NULL DEFAULT '0' COMMENT 'Whether deleted, 1 means deleted',
  `create_time` int(11) NOT NULL DEFAULT '0' COMMENT 'Creation time',
  `update_time` int(11) NOT NULL DEFAULT '0' COMMENT 'Update time',
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `idx_owner_flag` (`owner_flag`),
  KEY `idx_create_time` (`create_time`),
  UNIQUE KEY `idx_uniq_sid` (`sid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin /*T![auto_rand_base] AUTO_RANDOM_BASE=13803902764 */ COMMENT='xxx'
1 row in set (0.01 sec)

MySQL [up_ks_pigai]> 
MySQL [up_ks_pigai]> 
MySQL [up_ks_pigai]> explain DELETE FROM `tblCorrectRecordScancode` WHERE create_time < 1668287229 limit 1000;
+----------------------------------+---------+-----------+--------------------------------------------------------------------+-------------------------------------------+
| id                               | estRows | task      | access object                                                      | operator info                             |
+----------------------------------+---------+-----------+--------------------------------------------------------------------+-------------------------------------------+
| Delete_5                         | N/A     | root      |                                                                    | N/A                                       |
| └─IndexLookUp_14                 | 1000.00 | root      |                                                                    | limit embedded(offset:0, count:1000)      |
|   ├─Limit_13(Build)              | 1000.00 | cop[tikv] |                                                                    | offset:0, count:1000                      |
|   │ └─IndexRangeScan_11          | 1250.00 | cop[tikv] | table:tblCorrectRecordScancode, index:idx_create_time(create_time) | range:[-inf,1668287229), keep order:false |
|   └─TableRowIDScan_12(Probe)     | 1000.00 | cop[tikv] | table:tblCorrectRecordScancode                                     | keep order:false                          |
+----------------------------------+---------+-----------+--------------------------------------------------------------------+-------------------------------------------+

The delete task starts executing in the middle of the night, and monitoring shows that the memory of the tidb-server increases and the tikv CPU is almost fully utilized. This task has been running for two years and has never had this problem before.



Another issue is that I set max_execution_time = 5000, but all delete SQLs do not actively disconnect after timing out.

| username: xfworld | Original post link

There are two questions you can look at:

  1. Is the limit 1000 a hard requirement? The execution plan shows a total of 1250 records, so is it possible to not use the limit?
  2. The deletion process is divided into two steps: the first step is to query, and the second step is to delete. So, is the slowness due to the query being slow or the deletion being slow? Can you identify this?

The timeout rollback of max_execution_time has no necessary connection with the session link disconnection…

Because of the timeout rollback, the session is gone. Isn’t this phenomenon even stranger?

| username: dbaspace | Original post link

Strong, please continue to upgrade.

| username: dbaspace | Original post link

Deleting with a limit should be reasonable. This is basically how it’s done online. There are two issues with this:

  1. Too many keys are being scanned.
  2. The parallelism is only 1.
| username: 是我的海 | Original post link

  1. This table has billions of data, and the script deletes in batches, actually using limit 5000;


    The statistics were just collected this morning.

  2. Can the specific slowness of the deletion be identified from the execution plan information in the above image? This SQL execution plan uses the correct index. I reran a delete with the following information: Could you please check if the problem can be roughly located, and why the index lookup takes so long?

MySQL [up_ks_pigai]> explain analyze DELETE FROM `tblCorrectRecordScancode` force index(idx_create_time) WHERE create_time < 1668287229 limit 5000;

+----------------------------------+---------+---------+-----------+--------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+------+
| id                               | estRows | actRows | task      | access object                                                      | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | operator info                             | memory  | disk |
+----------------------------------+---------+---------+-----------+--------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+------+
| Delete_5                         | N/A     | 0       | root      |                                                                    | time:2m16.9s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | N/A                                       | 4.72 MB | N/A  |
| └─IndexLookUp_14                 | 5000.00 | 5000    | root      |                                                                    | time:2m16.8s, loops:6, index_task: {total_time: 2m16.7s, fetch_handle: 2m16.7s, build: 2.29µs, wait: 7.8µs}, table_task: {total_time: 220.2ms, num: 3, concurrency: 5}, next: {wait_index: 2m16.7s, wait_table_lookup_build: 568.5µs, wait_table_lookup_resp: 28.6ms}                                                                                                                                                                                                                                                                                                                                                                                          | limit embedded(offset:0, count:5000)      | 23.5 MB | N/A  |
|   ├─Limit_13(Build)              | 5000.00 | 6752    | cop[tikv] |                                                                    | time:2m16.7s, loops:6, cop_task: {num: 343, max: 1.94s, min: 276.5µs, avg: 398.4ms, p95: 793.4ms, max_proc_keys: 3040, p95_proc_keys: 0, tot_proc: 2m16.1s, tot_wait: 8ms, rpc_num: 343, rpc_time: 2m16.6s, copr_cache_hit_ratio: 0.14, distsql_concurrency: 1}, tikv_task:{proc max:1.94s, min:0s, avg: 456.5ms, p80:618ms, p95:793ms, iters:363, tasks:343}, scan_detail: {total_process_keys: 6752, total_process_keys_size: 310592, total_keys: 441734885, get_snapshot_time: 16.4ms, rocksdb: {delete_skipped_count: 57658013, key_skipped_count: 547864360, block: {cache_hit_count: 333684, read_count: 29, read_byte: 1.34 MB, read_time: 822.3µs}}}   | offset:0, count:5000                      | N/A     | N/A  |
|   │ └─IndexRangeScan_11          | 6250.00 | 6752    | cop[tikv] | table:tblCorrectRecordScancode, index:idx_create_time(create_time) | tikv_task:{proc max:1.94s, min:0s, avg: 456.4ms, p80:618ms, p95:793ms, iters:363, tasks:343}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | range:[-inf,1668287229), keep order:false | N/A     | N/A  |
|   └─TableRowIDScan_12(Probe)     | 5000.00 | 5000    | cop[tikv] | table:tblCorrectRecordScancode                                     | time:217.2ms, loops:9, cop_task: {num: 95, max: 61.1ms, min: 1.02ms, avg: 4.96ms, p95: 11ms, max_proc_keys: 127, p95_proc_keys: 92, tot_proc: 145ms, rpc_num: 95, rpc_time: 469.7ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, tikv_task:{proc max:4ms, min:0s, avg: 1.53ms, p80:2ms, p95:3ms, iters:179, tasks:95}, scan_detail: {total_process_keys: 5000, total_process_keys_size: 22088846, total_keys: 5538, get_snapshot_time: 1.86ms, rocksdb: {delete_skipped_count: 297, key_skipped_count: 1160, block: {cache_hit_count: 54287, read_count: 13, read_byte: 526.3 KB, read_time: 405.3µs}}}                                              | keep order:false                          | N/A     | N/A  |
+----------------------------------+---------+---------+-----------+--------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+------+
  1. I misunderstood the max_execution_time parameter. My understanding is that if the SQL execution exceeds 5 seconds, it will be terminated and return an error “interrupted”? This delete was not terminated because the waiting time for the index lookup is not counted in the execution time?

From the execution plan, it seems that the slowness is due to the cop_task. So, how can it be further optimized? I have encountered a technical bottleneck :joy:
cop_task: {num: 343, max: 1.95s, min: 712.6µs, avg: 460.7ms, p95: 821.8ms, max_proc_keys: 3040, p95_proc_keys: 0, tot_proc: 2m37.2s, tot_wait: 24ms, rpc_num: 343, rpc_time: 2m38s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 1}

| username: 是我的海 | Original post link

Are you doubting the capabilities of TiDB? :thinking:

| username: dbaspace | Original post link

Remember the quote from Lu Xun: “Doubt is not a flaw. Always doubting without making a conclusion, that is the flaw.”

| username: xfworld | Original post link

Is partitioning by time more suitable for your current scenario?

When deleting, you only need to consider the data within a certain partition. Instead of using delete, just use truncate.


@dbaspace has already pointed out your problem, too many keys :upside_down_face:

| username: 是我的海 | Original post link

There are too many limitations with TiDB partition tables, and besides, the business has been running for so long, you can’t just ask people to modify the tables. Also, why was everything fine before, but after an upgrade, it’s like this? Apart from the issue with too many keys, is there any other optimization space?

| username: xfworld | Original post link

Properly optimizing GC or increasing manual compaction can reduce the issue of version retention.

Automatic reference:

Manual reference:


See which one is more suitable for your scenario. After reducing the data versions, the number of keys will also decrease, and the scanning speed will naturally improve, alleviating the problem.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.