Manual Compaction in TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv 手动compact

| username: 田帅萌7

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v5.0.6
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Because

Performed manual compact
Cluster size: 3 TiKV nodes, total capacity 1.9T, single TiKV 680G
Started execution at 11:30 AM today
tiup ctl:v5.0.6 tikv --pd compact-cluster -d kv --threads 32
Single node capacity doubled

| username: zhanggame1 | Original post link

The compact process might not have finished. The execution process can occupy a huge amount of disk space.

| username: TiDBer_jYQINSnf | Original post link

Check the compact logs in rocksdb.info. Inside, there are compact jobs. Based on the job ID, you can check the progress.

The compact process is as follows:
Traverse the files in level 0 and level 1, use the SST files within the range as input, then read, merge, and output to level 1, and finally delete the input SST files.
Then from level 1 to level 2…
And so on, until the last level.
Normally, it should decrease gradually. This doubling doesn’t seem quite normal. Check the logs to see what’s going on.

| username: 田帅萌7 | Original post link

I found the job ID. How can I query based on the job ID?

| username: 有猫万事足 | Original post link

I am not quite sure how to check based on jobid, but I found this in the RocksDB documentation.
It says that the compaction status is output every 10 minutes.
I looked in the rocksdb.info file and indeed found similar logs. I hope this might be helpful to you.

| username: 田帅萌7 | Original post link

After executing compact, the execution plan becomes unstable.

MySQL [(none)]> use hotel_product;
Database changed
MySQL [hotel_product]> explain ANALYZE select * from prefetch_key_info limit 100;
+----------------------------+---------+---------+-----------+-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| id                         | estRows | actRows | task      | access object           | execution info                                                                                                                                                                                                                                                                                  | operator info       | memory  | disk |
+----------------------------+---------+---------+-----------+-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| Limit_7                    | 100.00  | 100     | root      |                         | time:3m2.9s, loops:2                                                                                                                                                                                                                                                                            | offset:0, count:100 | N/A     | N/A  |
| └─TableReader_12           | 100.00  | 100     | root      |                         | time:3m2.9s, loops:1, cop_task: {num: 806, max: 742.3ms, min: 922.4µs, avg: 226.8ms, p95: 448.7ms, max_proc_keys: 100, p95_proc_keys: 0, tot_proc: 2m47.9s, tot_wait: 14.2s, rpc_num: 808, rpc_time: 3m2.8s, copr_cache_hit_ratio: 0.00}, backoff{regionMiss: 2ms}                           | data:Limit_11       | 18.9 KB | N/A  |
|   └─Limit_11               | 100.00  | 100     | cop[tikv] |                         | tikv_task:{proc max:620ms, min:0s, p80:298ms, p95:423ms, iters:808, tasks:806}, scan_detail: {total_process_keys: 100, total_keys: 423330044, rocksdb: {delete_skipped_count: 51187119, key_skipped_count: 474516358, block: {cache_hit_count: 6711, read_count: 294819, read_byte: 829.3 MB}}} | offset:0, count:100 | N/A     | N/A  |
|     └─TableFullScan_10     | 100.00  | 100     | cop[tikv] | table:prefetch_key_info | tikv_task:{proc max:620ms, min:0s, p80:298ms, p95:423ms, iters:808, tasks:806}, scan_detail: {total_process_keys: 0, total_keys: 0, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}}                                 | keep order:false    | N/A     | N/A  |
+----------------------------+---------+---------+-----------+-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
4 rows in set (3 min 3.78 sec)

MySQL [hotel_product]> ANALYZE table prefetch_key_info;
Query OK, 0 rows affected (20 min 43.13 sec)

MySQL [hotel_product]> explain ANALYZE select * from prefetch_key_info limit 100;
+----------------------------+---------+---------+-----------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| id                         | estRows | actRows | task      | access object           | execution info                                                                                                                                                                                                                                                                              | operator info       | memory  | disk |
+----------------------------+---------+---------+-----------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| Limit_7                    | 100.00  | 100     | root      |                         | time:2m37s, loops:2                                                                                                                                                                                                                                                                         | offset:0, count:100 | N/A     | N/A  |
| └─TableReader_12           | 100.00  | 100     | root      |                         | time:2m37s, loops:1, cop_task: {num: 802, max: 675.2ms, min: 518.6µs, avg: 195.7ms, p95: 431.4ms, max_proc_keys: 100, p95_proc_keys: 0, tot_proc: 2m24.8s, tot_wait: 11.4s, rpc_num: 802, rpc_time: 2m36.9s, copr_cache_hit_ratio: 0.08}                                                   | data:Limit_11       | 18.9 KB | N/A  |
|   └─Limit_11               | 100.00  | 100     | cop[tikv] |                         | tikv_task:{proc max:574ms, min:0s, p80:280ms, p95:404ms, iters:804, tasks:802}, scan_detail: {total_process_keys: 100, total_keys: 387235317, rocksdb: {delete_skipped_count: 47320693, key_skipped_count: 434555272, block: {cache_hit_count: 276457, read_count: 18, read_byte: 382.9 KB}}} | offset:0, count:100 | N/A     | N/A  |
|     └─TableFullScan_10     | 100.00  | 100     | cop[tikv] | table:prefetch_key_info | tikv_task:{proc max:574ms, min:0s, p80:280ms, p95:404ms, iters:804, tasks:802}, scan_detail: {total_process_keys: 0, total_keys: 0, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}}                                 | keep order:false    | N/A     | N/A  |
+----------------------------+---------+---------+-----------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
4 rows in set (2 min 37.80 sec)

MySQL [hotel_product]> explain ANALYZE select * from prefetch_key_info limit 100;
+----------------------------+---------+---------+-----------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| id                         | estRows | actRows | task      | access object           | execution info                                                                                                                                                                                                                                                                           | operator info       | memory  | disk |
+----------------------------+---------+---------+-----------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| Limit_7                    | 100.00  | 100     | root      |                         | time:7.7s, loops:2                                                                                                                                                                                                                                                                       | offset:0, count:100 | N/A     | N/A  |
| └─TableReader_12           | 100.00  | 100     | root      |                         | time:7.7s, loops:1, cop_task: {num: 802, max: 433.9ms, min: 393.4µs, avg: 9.55ms, p95: 2.06ms, max_proc_keys: 100, p95_proc_keys: 0, tot_proc: 7.03s, tot_wait: 293ms, rpc_num: 804, rpc_time: 7.65s, copr_cache_hit_ratio: 0.95}, backoff{regionMiss: 2ms}                           | data:Limit_11       | 18.9 KB | N/A  |
|   └─Limit_11               | 100.00  | 100     | cop[tikv] |                         | tikv_task:{proc max:574ms, min:0s, p80:280ms, p95:404ms, iters:804, tasks:802}, scan_detail: {total_process_keys: 100, total_keys: 19583148, rocksdb: {delete_skipped_count: 1814115, key_skipped_count: 21397225, block: {cache_hit_count: 13487, read_count: 7, read_byte: 316.2 KB}}} | offset:0, count:100 | N/A     | N/A  |
|     └─TableFullScan_10     | 100.00  | 100     | cop[tikv] | table:prefetch_key_info | tikv_task:{proc max:574ms, min:0s, p80:280ms, p95:404ms, iters:804, tasks:802}, scan_detail: {total_process_keys: 0, total_keys: 0, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}}                                 | keep order:false    | N/A     | N/A  |
+----------------------------+---------+---------+-----------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
4 rows in set (7.74 sec)

MySQL [hotel_product]> explain ANALYZE select * from prefetch_key_info limit 100;
+----------------------------+---------+---------+-----------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| id                         | estRows | actRows | task      | access object           | execution info                                                                                                                                                                                                                                                  | operator info       | memory  | disk |
+----------------------------+---------+---------+-----------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
| Limit_7                    | 100.00  | 100     | root      |                         | time:709.4ms, loops:2                                                                                                                                                                                                                                            | offset:0, count:100 | N/A     | N/A  |
| └─TableReader_12           | 100.00  | 100     | root      |                         | time:709.4ms, loops:1, cop_task: {num: 802, max: 4.63ms, min: 381.5µs, avg: 846.7µs, p95: 1.76ms, tot_proc: 2ms, tot_wait: 366ms, rpc_num: 802, rpc_time: 671.2ms, copr_cache_hit_ratio: 1.00}                                                                  | data:Limit_11       | 18.9 KB | N/A  |
|   └─Limit_11               | 100.00  | 100     | cop[tikv] |                         | tikv_task:{proc max:574ms, min:0s, p80:280ms, p95:404ms, iters:804, tasks:802}, scan_detail: {total_process_keys: 0, total_keys: 3, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 40, read_count: 3, read_byte: 143.9 KB}}} | offset:0, count:100 | N/A     | N/A  |
|     └─TableFullScan_10     | 100.00  | 100     | cop[tikv] | table:prefetch_key_info | tikv_task:{proc max:574ms, min:0s, p80:280ms, p95:404ms, iters:804, tasks:802}, scan_detail: {total_process_keys: 0, total_keys: 0, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}}   | keep order:false    | N/A     | N/A  |
+----------------------------+---------+---------+-----------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------+------+
4 rows in set (0.74 sec)
| username: h5n1 | Original post link

Compact like this:

tikv-ctl --host xxxxx:20160 compact -c write -d kv --bottommost force
tikv-ctl --host xxxx:20160 compact -c default -d kv --bottommost force
| username: TiDBer_jYQINSnf | Original post link

cat rocksdb.info |grep EVENT_LOG_v|grep "job\": 161"

This filters all event_logs with job=161, which will record compaction_started, table_file_creation, table_file_deletion, and compaction_finished events. Basically, you can see the execution status of this compaction.

cat rocksdb.info |grep EVENT_LOG_v|grep "job\": 161"
[2024/01/24 10:07:21.743 +08:00][4][INFO] EVENT_LOG_v1 {"time_micros": 1706062041741309, "job": 161, "event": "compaction_started", "compaction_reason": "LevelL0FilesNum", "files_L0": [199], "files_L6": [188], "score": 1, "input_data_size": 4344}
[2024/01/24 10:07:21.744 +08:00][4][INFO] EVENT_LOG_v1 {"time_micros": 1706062041743394, "cf_name": "raft", "job": 161, "event": "table_file_creation", "file_number": 229, "file_size": 3323, "table_properties": {"data_size": 2487, "index_size": 38, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 0, "raw_key_size": 912, "raw_average_key_size": 19, "raw_value_size": 1807, "raw_average_value_size": 37, "num_data_blocks": 1, "num_entries": 48, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "raft", "column_family_id": 3, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "NoopSliceTransform", "property_collectors": "[]", "compression": "NoCompression", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1706062020, "oldest_key_time": 0, "file_creation_time": 1706062041}}
[2024/01/24 10:07:21.745 +08:00][4][INFO] (Original Log Time 2024/01/24-10:07:21.744843) EVENT_LOG_v1 {"time_micros": 1706062041744822, "job": 161, "event": "compaction_finished", "compaction_time_micros": 2235, "compaction_time_cpu_micros": 1814, "output_level": 6, "num_output_files": 1, "total_output_size": 3323, "num_input_records": 50, "num_output_records": 48, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 0, 0, 0, 0, 0, 1]}
[2024/01/24 10:07:21.747 +08:00][4][INFO] EVENT_LOG_v1 {"time_micros": 1706062041745531, "job": 161, "event": "table_file_deletion", "file_number": 199}
[2024/01/24 10:07:21.747 +08:00][4][INFO] EVENT_LOG_v1 {"time_micros": 1706062041745708, "job": 161, "event": "table_file_deletion", "file_number": 188}

For example, in this case, the compact input files are level 0’s 199 and level 6’s 188, with an input size of 4344.
The second line shows a file creation, file 229, with a size of 3323.
The third line indicates the compact finished, outputting to level 6, with one output file of size 3323.
The last two lines show the deletion of the input files. This compact is then complete.
Disk usage goes from 4344 to 4344+3323 to 3323.
Since a single compact is unlikely to involve all SST files of all column families, at most it involves two levels of a single column family, the space doubling shouldn’t be that exaggerated. Specifically, you should check the logs and see the contents of the db directory.

| username: dba远航 | Original post link

The compression situation is related to the stored data.

| username: 田帅萌7 | Original post link

[2024/01/25 16:46:42.672 +08:00][2][INFO] [db/compaction/compaction_job.cc:1690] [lock] Compaction start summary: Base version 30145 Base level 0, inputs: [7753036(2496KB)], [7753032(412KB)]
[2024/01/25 16:46:42.672 +08:00][2][INFO] EVENT_LOG_v1 {“time_micros”: 1706172402672831, “job”: 44850, “event”: “compaction_started”, “compaction_reason”: “LevelL0FilesNum”, “files_L0”: [7753036], “files_L6”: [7753032], “score”: 1, “input_data_size”: 2979055}
[2024/01/25 16:46:42.702 +08:00][2][INFO] [db/compaction/compaction_job.cc:1373] [lock] [JOB 44850] Generated table #7753037: 2176 keys, 369969 bytes
[2024/01/25 16:46:42.703 +08:00][2][INFO] [db/compaction/compaction_job.cc:1436] [lock] [JOB 44850] Compacted 1@0 + 1@6 files to L6 => 369969 bytes
[2024/01/25 16:46:42.705 +08:00][2][INFO] (Original Log Time 2024/01/25-16:46:42.703281) [db/compaction/compaction_job.cc:767] [lock] compacted to: base level 6 level multiplier 10.00 max bytes base 134217728 files[0 0 0 0 0 0 1] max score 0.00, MB/sec: 100.1 rd, 12.4 wr, level 6, files in(1, 1) out(1) MB in(2.4, 0.4) out(0.4), read-write-amplify(1.3) write-amplify(0.1) OK, records in: 94037, records dropped: 91861 output_compression: NoCompression
[2024/01/25 16:46:42.705 +08:00][2][INFO] (Original Log Time 2024/01/25-16:46:42.703299) EVENT_LOG_v1 {“time_micros”: 1706172402703291, “job”: 44850, “event”: “compaction_finished”, “compaction_time_micros”: 29764, “compaction_time_cpu_micros”: 27119, “output_level”: 6, “num_output_files”: 1, “total_output_size”: 369969, “num_input_records”: 94037, “num_output_records”: 2176, “num_subcompactions”: 1, “output_compression”: “NoCompression”, “num_single_delete_mismatches”: 0, “num_single_delete_fallthrough”: 0, “lsm_state”: [0, 0, 0, 0, 0, 0, 1]}

| username: 田帅萌7 | Original post link

tikv:
    raftstore.capacity: 1024G
    readpool.coprocessor.use-unified-pool: true
    readpool.storage.use-unified-pool: false
    readpool.unified.max-thread-count: 32
    storage.block-cache.capacity: 96G
| username: PINO | Original post link

From the logs, it does appear to be compacting, but it’s unclear whether it’s automatic or manual. Let’s go with the process executed manually.

| username: 田帅萌7 | Original post link

The manual process has already been closed.

| username: TiDBer_jYQINSnf | Original post link

What you posted is a regular l0 file triggered downward compact. If you want to see what happened at that time, look for the log with compactreason: manual_compact when you manually triggered the compact.
If you just want to see the current status, you can list how many SSTs are in each level of the current RocksDB through:

tikv-ctl ldb manifest_dump --path=./db/MANIFEST-XX --hex

You can also check how many files are marked for deletion through:

tikv-ctl ldb list_file_range_deletes --db=./tikv/db/

Additionally, you can check if the db directory is normal through:

tikv-ctl ldb checkconsistency --db=./tikv/db/

However, it’s best not to tinker too much; the space will eventually shrink back.

| username: 江湖故人 | Original post link

There might be other compacts that haven’t been completed yet, let’s wait and see.

| username: 田帅萌7 | Original post link

The logs show that it is still compacting.

| username: TiDBer_jYQINSnf | Original post link

Compaction is happening continuously.

| username: 田帅萌7 | Original post link

Is the impact that significant?

| username: TiDBer_jYQINSnf | Original post link

Compaction occurs whenever there are writes. The manual compaction you triggered is just a larger-scale compaction.

| username: 田帅萌7 | Original post link

The manually triggered operation stopped 8 days ago. I understand that the CPU was affected during the operation, but why is it still affecting the CPU after such a long time?