Does TIDB delete cause disk fragmentation?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB delete会产生磁盘碎片吗

| username: Mirror

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v7.5
[Reproduction Path] We have a very large table stored in TiDB with billions of records, but the application says it only needs to store data for six months. If I delete the data, without discussing the speed issue, will there be disk fragmentation? Does anyone have similar experience, and how should we clean up the space?
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

| username: Soysauce520 | Original post link

There is no disk fragmentation, the disk will be organized during GC.

| username: Mirror | Original post link

Will that space be released?

| username: lemonade010 | Original post link

TiDB storage nodes automatically initiate data compaction in the background. During data compaction, the physical data in the table is rewritten, such as cleaning up deleted data, merging multiple versions of data, etc., which can result in higher access performance and reduced disk space usage. Using the ALTER TABLE ... COMPACT statement allows you to immediately compact the specified table without waiting for the background trigger.

This statement does not block the execution of existing SQL statements or the use of TiDB features, including transactions, DDL, GC, etc., nor does it change the data content accessed through SQL statements. The execution of this statement will consume a certain amount of IO and CPU resources, so please choose an appropriate time to execute it, such as during idle resource periods, to avoid negative impacts on business.

The statement will wait until all replicas of the table have completed data compaction before finishing and returning. During execution, you can safely interrupt the data compaction process of the table using the KILL statement. Interrupting will not compromise data consistency or cause data loss, nor will it affect subsequent re-initiations or automatic background data compaction triggers.

| username: Kongdom | Original post link

:thinking: Is this several billion the total data volume or the data volume for half a year? If it’s the total data volume, then what is the data volume for half a year?
If the data volume for half a year is relatively small, can we create a new table, migrate the half-year data, and then replace the table by renaming it?

| username: TiDBer_ivan0927 | Original post link

The underlying layer is leveldb. When deleting data, there may be fragments, but leveldb will perform compaction. This process merges multiple small files into larger files and recovers unused space. This helps reduce disk fragmentation and improve storage efficiency.

| username: Mirror | Original post link

It’s not realistic to archive data from six months, which amounts to billions, through migration.

| username: Mirror | Original post link

  1. After compaction, will the space usage automatically decrease? Or will it not decrease, but the merged space can be reused?
  2. For my large table, is it better to clean it using the delete method or to solve it with partition tables? When using partition tables, is there a requirement that the where condition must include the partition key?
    Thanks for the reply.
| username: 小龙虾爱大龙虾 | Original post link

The delete operation will eventually free up space, just do it :grinning:

| username: xmlianfeng | Original post link

After deletion, the MVCC versions will be cleaned up when the GC time arrives, and the space will be released.

| username: porpoiselxj | Original post link

For scenarios where you need to clean up historical data, this is a typical use case for partitioned tables. When data expires, you can directly drop the partition. During normal use, it’s best if the conditions include the partition field. If they don’t, you’ll have to scan all partitions, which is slightly less efficient than having an index on the partition field in a single table.

| username: 小于同学 | Original post link

GC will solve it.

| username: dba远航 | Original post link

TiDB uses LSM-Tree, with data continuously increasing and GC cleaning up expired data, so there is no issue of fragmentation.

| username: TIDB-Learner | Original post link

The delete operation only marks the data for deletion, and the expired data is periodically cleaned up through GC. Compaction will be performed, which merges multiple small files into larger files to reduce fragmentation. Is that correct?

| username: zhanggame1 | Original post link

It will release very slowly and cannot be fully released. If you want to speed up the release, you need to manually compact, which consumes a lot of resources.

| username: zhanggame1 | Original post link

For billions of rows, you should use partition tables. Dropping a partition takes just one second.

The delete operation first triggers garbage collection (GC), which works like this:
The hard drive has a bunch of SST files storing sorted key-value data. A delete operation actually writes new key-value pairs, and then GC marks these unused key-value pairs as deleted.
The compaction operation re-sorts and compresses the SST files, discarding the data marked for deletion, then generates new SST files and deletes the old ones.

| username: zhanggame1 | Original post link

GC cannot release, compact releases disk space very slowly, resource consumption is particularly high, and merge sorting is required.

| username: TiDBer_ivan0927 | Original post link

It will be released, I recommend using partitioned tables.

| username: residentevil | Original post link

No, the underlying engine is ROCKSDB, which has an automatic COMPACTION mechanism. As long as there are continuous writes online, the space will also decrease accordingly.

| username: WinterLiu | Original post link

You can consider creating partitions and directly deleting partitions later.