About GC and Compact

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于gc和compact

| username: zhanggame1

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 7.6.0
[Reproduction Path] What operations were performed that caused the issue
[Encountered Issue: Issue Phenomenon and Impact]

I conducted a test where I deleted all data from a table with 100 million records and waited for the GC time to pass. Then I manually ran compact several times, reducing the number of regions from 400 to 1. Has the disk space been released?

My understanding is that compact reorganizes new SST files and deletes old SST files, and the deleted data is cleared in this process, so the disk space should be released. I’m not sure if my understanding is correct.

| username: tidb菜鸟一只 | Original post link

Compact will release space.

| username: 扬仔_tidb | Original post link

The image is not available for translation. Please provide the text content directly.

| username: 扬仔_tidb | Original post link

However, you cannot clean TiKV, you can only clean TiFlash.

| username: 江湖故人 | Original post link

GC will not shrink SST files; it only adds a deletion marker to entries, so disk space will not be freed.
Compaction will merge multiple small SST files into a larger SST file and clean up deleted entries, thereby freeing up disk space.

| username: 江湖故人 | Original post link

You can refer to this

| username: TiDBer_jYQINSnf | Original post link

After doing this, it feels like the space has been released. Has it actually not been released?

| username: forever | Original post link

After GC, empty regions will be merged, and after compaction, SST files will be merged to free up space.

| username: TiDBer_jYQINSnf | Original post link

If the region is not empty and there are quite a few delete keys, it can also reduce space, right?

| username: forever | Original post link

In the case of a large number of delete keys, space cannot be reduced without compaction.

| username: Jellybean | Original post link

In TiDB, using DELETE, TRUNCATE, and DROP statements to delete data will not immediately release space.

For TRUNCATE and DROP operations, after reaching TiDB’s GC time (default 10 minutes), TiDB’s GC mechanism will delete the data and release the space.
For DELETE operations, TiDB’s GC mechanism will delete the data but will not immediately release the space. Instead, the space will be released during subsequent compaction.

| username: tidb菜鸟一只 | Original post link

Currently, it is not possible to compact a single table’s TiKV. You can only find the regions corresponding to the table and then compact the corresponding regions.

| username: WinterLiu | Original post link

Got it. We can’t clean up TiKV, we can only clean up TiFlash.

| username: zhanggame1 | Original post link

The “compact” mentioned here refers to TiKV’s.

| username: TiDBer_jYQINSnf | Original post link

Compact does not require manual handling; it will be executed automatically in the background.

| username: zhanggame1 | Original post link

The trigger conditions for such resource-intensive operations automatically executed by TiDB are very conservative.

| username: forever | Original post link

It can be automatic or manual. For urgent space release, manual intervention is required. As @zhanggame1 mentioned, the automatic trigger conditions are very conservative. When you urgently need to clean up data and re-import it, you can’t just wait indefinitely.

| username: 扬仔_tidb | Original post link

Nowadays, disks are not expensive. Our TiDB cluster has never performed a compact operation, so we don’t mind the extra space. :grinning:

| username: zhanggame1 | Original post link

Not just a space issue, unused space is not released, and SQL execution may scan it, wasting a lot of I/O and time.

| username: TiDBer_jYQINSnf | Original post link

Writes must be executed.

Since we’re on this topic, let me elaborate a bit on the compaction process.

The write process is as follows: first, write to the WAL, then write to the memtable, which is in memory. When this memtable reaches the write_buffer_size, it is converted into an immutable memtable, still in memory. This continues until the number of memtables reaches max-write-buffer-number, triggering a flush to disk.

The flush writes to level 0. When the number of files in level 0 reaches level0-file-num-compaction-trigger, it triggers a compaction from level 0 to level 1.

When does it continue to compact downwards? This is controlled by the max-bytes-for-level-multiplier parameter. For example, if max-bytes-for-level-multiplier is 10, level 0 is 1GB, then level 1 will compact downwards when it reaches 10GB, level 2 at 100GB, and level 3 at 1000GB. TiDB defaults to 6 levels.

So, looking at it this way, deleting a key has a low probability of being compacted, making it difficult to reduce space, but theoretically, there is a chance for automatic compaction.

(The following content hasn’t been thoroughly verified with the code, so it can only be roughly considered correct.)
So, why does merging empty regions reduce space more quickly?
It’s because of delete_range. Empty regions directly execute deletefileinrange, which deletes the SST files within that range.