Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: GC和compact的几个问题
【TiDB Usage Environment】Production Environment / Test / Poc
Production Environment
【TiDB Version】
v5.1.2
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issues: Issue Phenomenon and Impact】
【Issue 1】
After the key is in GC but before compaction, does SQL still need to scan these keys when searching? Or does it skip these keys directly? If it skips, how is it achieved or what is the principle?
【Issue 2】
You can use the following interface to find multi-version data of a row, but this data cannot distinguish whether it has been GC’d or not. Only after GC + compaction, the multi-version data retrieved through this interface will completely disappear.
curl http://192.168.1.1:10080/mvcc/key/dbname/table_name/343930187735040
Is there an interface to check whether the multi-version data has been GC’d?
【Issue 3】
bug-11217: Multi-key GC call causes GC to not work, leaving a large number of historical versions
Regarding this issue, we adopted the method of setting gc.enable-compaction-filter: false to disable TiKV’s compaction filter GC and use the old GC mode for multi-version GC.
However, the effect is not very good. Here is a specific example:
The following SQL was executed on November 29th, using a covering index to find 100 rows of data, where create_time<=1669267277441 translates to create_time<=‘2022-11-24 13:21:17’.
But we disabled the GC compaction filter and used the old GC method on November 23rd.
desc analyze SELECT id FROM table_xxx FORCE INDEX(idx_essyncstate_bizproduct) WHERE (biz_id=300 AND es_sync_state=1 AND product_id=300 AND create_time<=1669267277441) ORDER BY create_time DESC LIMIT 100\G
*************************** 4. row ***************************
id: └─Limit_22
estRows: 4.12
actRows: 189
task: cop[tikv]
access object:
execution info: tikv_task:{proc max:485ms, min:70ms, p80:412ms, p95:456ms, iters:46, tasks:43}, scan_detail: {total_process_keys: 189, total_keys: 34711190, rocksdb: {delete_skipped_count: 610, key_skipped_count: 34712398, block: {cache_hit_count: 29983, read_count: 1866, read_byte: 38.9 MB}}}
operator info: offset:0, count:100
memory: N/A
disk: N/A
【Issue 4】
show config where type=‘tikv’ and name like ‘%enable-compaction-filter%’;
set config tikv gc.enable-compaction-filter=false;
In version 5.1.2, turning off this parameter, can it completely solve the GC bug? Will there be other risks? (If a large number of clusters use the old GC method online)
The official recommendation is not to modify parameters online, so is it recommended to use tikv-ctl to disable the new GC method?
【Issue 5】
What do delete_skipped_count and key_skipped_count in the execution plan mean?
There are multiple different explanations for these two fields in various places, but I still don’t understand what they mean. Can you help explain them for better issue analysis?
Version 1
delete_skipped_count: Indicates that the key has been GC’d, status is tombstone, but not yet compacted.
key_skipped_count: Indicates that the same key has multiple MVCC versions, and the GC time has not passed.
Version 2
Rocksdb_delete_skipped_count: The number of deleted keys scanned during RocksDB data reading.
Rocksdb_key_skipped_count: The number of deleted (tombstone) keys encountered during RocksDB data scanning.
Version 3
In a post on asktug
Here is a test:
create table t123 (id int primary key, name varchar(100), age int, city varchar(100));
alter table t123 add index idx_name(name);
insert into t123 values (1,‘user1’,1,‘bj’);
insert into t123 values (2,‘user2’,2,‘bj’);
insert into t123 values (3,‘user3’,3,‘bj’);
insert into t123 values (4,‘user4’,4,‘bj’);
insert into t123 values (5,‘user5’,5,‘bj’);
insert into t123 values (6,‘user6’,6,‘bj’);
insert into t123 values (7,‘user7’,7,‘bj’);
insert into t123 values (8,‘user8’,8,‘bj’);
insert into t123 values (9,‘user9’,9,‘bj’);
insert into t123 values (10,‘user10’,10,‘bj’);
desc analyze select name from t123 where name=‘user1’\G
There is 1 row of name=‘user1’ data in the table, no updates or deletions have been performed, and part of the execution plan is as follows:
scan_detail: {total_process_keys: 1, total_keys: 2, rocksdb: {delete_skipped_count: 1, key_skipped_count: 2
There are 3 rows of name=‘user1’ data in the table, no updates or deletions have been performed, and part of the execution plan is as follows:
scan_detail: {total_process_keys: 3, total_keys: 4, rocksdb: {delete_skipped_count: 3, key_skipped_count: 6
Thank you all
【Resource Configuration】
【Attachments: Screenshots / Logs / Monitoring】