About Table Data Deletion, GC, and Region Merge

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于表删除数据后gc和region merge

| username: zhanggame1

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 7.1
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Issue Phenomenon and Impact]
Conducted a test, deleting data from a table with 10 million rows.
All data was deleted, but checking the disk space showed it was not released. The GC is set to default and progressing normally.
What is the issue? Approximately how long after deleting the data will the regions show as empty, and how long will it take to merge?

| username: zhanggame1 | Original post link

Deleted all data from the test table, but after 10 hours the space has not been released and the regions are still there.

| username: xfworld | Original post link

You can check the GC configuration for troubleshooting

The GC execution process can be referred to in the following document

If the data has been deleted and you need a quick GC to release it, you can execute it manually:

| username: zhanggame1 | Original post link

There are issues with the GC parameters. You can try deleting data, but the space occupied by regions will not be released.

| username: xfworld | Original post link

The merging and configuration of regions are related to the scheduling speed of the cluster.

Additionally, if the merging configuration is not enabled, empty regions will not be merged.

| username: redgame | Original post link

gc_life_time and gc_run_interval, and adjust according to the actual situation.”

| username: zhanggame1 | Original post link

Both gc_life_time and gc_run_interval are set to 10 minutes. You can test deleting a large amount of data; it’s very difficult to trigger space reclamation.

| username: zhanggame1 | Original post link

The merge configuration is enabled by default. Although all the data has been deleted, as shown in the figure above, the size of the queried regions is still higher than the merge trigger condition of 20M.

| username: zhanggame1 | Original post link

I have also tried compact, but it didn’t have any effect.

| username: xfworld | Original post link

The merge scheduling strategy can be adjusted because too frequent scheduling will affect the performance of the cluster, and the default value is relatively conservative.
There is also some introduction in the official documentation:

If the merge is not effective, you can check whether the cross-table merge parameter is enabled. Refer to this document:

| username: zhanggame1 | Original post link

Using version 7.1, all parameters for cross-table merging are enabled. The test is relatively simple: create a new table, insert a large amount of data, then loop to delete all of it. It turns out that the space cannot be reclaimed.

| username: cy6301567 | Original post link

This is explained in more detail.

| username: TiDBer_vfJBUcxl | Original post link

After performing delete operations on data, space reclamation is relatively slow. How to handle it?
TiDB uses Multi-Version Concurrency Control (MVCC). To allow concurrent transactions to view earlier versions of the data, deleted data does not immediately reclaim space but instead delays garbage collection (GC) for a period of time.

  1. You can configure the retention period of historical data by modifying the system variable tidb_gc_life_time (default value is 10m0s).
  2. You can set parallel GC to accelerate space reclamation. The default concurrency is 1, and it can be adjusted up to 50% of the number of TiKV instances. You can use the command update mysql.tidb set VARIABLE_VALUE=3 where VARIABLE_NAME="tikv_gc_concurrency"; to adjust it.

Does TiDB immediately release space after deleting data?
DELETE, TRUNCATE, and DROP do not immediately release space. For TRUNCATE and DROP operations, after reaching TiDB’s GC (garbage collection) time (default 10 minutes), TiDB’s GC mechanism will delete the data and release the space. For DELETE operations, TiDB’s GC mechanism will delete the data, but it will not release the space. Instead, the space will be reused when subsequent data is written to RocksDB and compacted.

What is the most efficient and fastest way to delete data?
When deleting a large amount of data, it is recommended to use DELETE FROM t WHERE x LIMIT 5000. It is advisable to use a strong filtering index column or directly use the primary key to select a range, such as id >= 5000n + m AND id <= 5000(n + 1) + m, and delete in a loop, using Affected Rows == 0 as the loop termination condition. This avoids the limitation of transaction size. If the amount of data to be deleted at once is very large, this looping method will become slower because each deletion traverses from the beginning. After the initial deletions, many deletion marks will remain for a short time (which will be GC’d later), affecting subsequent DELETE statements. If possible, it is recommended to refine the WHERE condition.

| username: zhanggame1 | Original post link

For the DELETE operation, TiDB’s GC mechanism will delete the data but will not release the space.

In other words, even after the DELETE operation and GC, the space will not be released.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.