TiDB 5.1: Deleting 200 million records in a loop, but disk space hasn't decreased significantly after a week

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB5.1 进行循环删除2亿的数据但是磁盘空间一周时间了并没有下降很多。

| username: TiDBer_RQobNXGv

TiDB 5.1 is performing a loop deletion of 200 million records, but the disk space has not decreased significantly after a week. What could be the reason? How can this be resolved?

| username: 像风一样的男子 | Original post link

Delete will not release disk space; you need to manually compact to release it.

| username: TiDBer_RQobNXGv | Original post link

What is the manual compact command? Will it affect the stability of the production environment?

| username: 像风一样的男子 | Original post link

Manually compacting will consume KV disk resources. It is recommended to do it during off-peak business hours, one KV at a time.

| username: Kongdom | Original post link

Will TiDB immediately release space after deleting data?

| username: 哈喽沃德 | Original post link

How much data is left after deleting 200 million?

| username: 小龙虾爱大龙虾 | Original post link

How much space do you expect to free up?

| username: kelvin | Original post link

The delete operation will generate a lot of logs, right?

| username: TiDBer_lBAxWjWQ | Original post link

This delete doesn’t release. Creating a new table and renaming it would also work.

| username: tidb菜鸟一只 | Original post link

Use shell to compact only the oltp.sbtest8 table (replace with your table name), adding -c write -d kv

mysql -uroot -pXXX -hxxx -PXXX information_schema -e "select region_id from tikv_region_status where db_name='oltp' and table_name='sbtest8'" > region_list
cat region_list | while read line
do
    tiup ctl:v5.1.0 tikv --host xxxx:20160 compact -r $line -d kv -c write --threads 1 --bottommost force
    tiup ctl:v5.1.0 tikv --host xxx:20160 compact -r $line -d kv -c default --threads 1 --bottommost force
done
| username: zhanggame1 | Original post link

Manual compaction is also not effective; it’s extremely slow, consumes a lot of disk space, and the final result is not significant.

| username: 数据库真NB | Original post link

Simply deleting data will not reduce physical disk space.

Only deleting data files will reduce it. Additionally, note that the logs of the delete operation do not need to be retained.

| username: dba远航 | Original post link

It seems that the delete mechanism still needs improvement.

| username: Soysauce520 | Original post link

Take a look at the GC time and check if there is new data being written in the PD monitoring.

| username: zhaokede | Original post link

In business, we cannot use drop and truncate, only delete. In this case, don’t worry about whether the space is released, just focus on whether the table space is still growing.

| username: 哈喽沃德 | Original post link

TiDB adopts Multi-Version Concurrency Control (MVCC). To allow concurrent transactions to view earlier versions of data, deleted data does not immediately reclaim space but is instead garbage collected (GC) after a delay. You can configure the retention period of historical data by modifying the system variable tidb_gc_life_time (default value is 10m0s).

Parallel GC can be set to speed up space reclamation. The default concurrency is 1, and it can be adjusted up to 50% of the number of TiKV instances. You can use the command update mysql.tidb set VARIABLE_VALUE="3" where VARIABLE_NAME="tikv_gc_concurrency"; to adjust this.

DELETE, TRUNCATE, and DROP do not immediately release space. For TRUNCATE and DROP operations, TiDB’s GC mechanism will delete data and release space after the GC time is reached (default 10 minutes).

For DELETE operations, TiDB’s GC mechanism will delete the data but will not release the space. Instead, the space will be reused when subsequent data is written to RocksDB and compaction occurs.

| username: zhanggame1 | Original post link

I manually compacted it yesterday and tested it. Running it once might not be effective; it may need to be run repeatedly to see results.

| username: 像风一样的男子 | Original post link

If you have used MySQL, you would know that it operates on the same mechanism.

| username: zhanggame1 | Original post link

MySQL can use ALTER TABLE A ENGINE=InnoDB to rebuild the table. Higher versions support online DDL, making this problem very easy to solve.

| username: TiDBer_RQobNXGv | Original post link

How to solve this?