Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: Tidb按条件删除数据能释放磁盘空间吗,如DELETE FROM table_x WHERE id IN (xxxx,xx)
Can deleting data based on conditions in TiDB release disk space, such as DELETE FROM table_x WHERE id IN (xxxx,xx)?
Delete will not release disk space; you can manually compact to release it.
Will this operation block? The data volume is quite large.
Manual compaction will occupy a large amount of disk space and is quite slow, so it is recommended to perform it during off-peak business hours.
In TiDB, using DELETE
, TRUNCATE
, and DROP
statements to delete data will not immediately release space. For TRUNCATE
and DROP
operations, after reaching TiDB’s GC (garbage collection) time (default is 10 minutes), TiDB’s GC mechanism will delete the data and release the space. For DELETE
operations, TiDB’s GC mechanism will delete the data but will not immediately release the space; instead, the space will be released during subsequent compaction.
Yes, after deleting, it will first go through GC, and then automatically compact after a day or two. Manual compaction is very effective but consumes a lot of resources, so use it cautiously in production environments.
Based on my testing experience, do not manually compact the database running the business; both CPU and IO can be fully utilized. After deleting the data, let the database compact slowly over a few days.
The image you provided is not visible. Please provide the text you need translated.
If space is not urgent, wait for the GC to complete and then automatically compact…
Okay, thank you. So it means I can delete the data first and let it release itself, right?
There is still 20% left, but there is too much useless data. I don’t want to expand anymore, I want to delete it directly.
When deleting the remaining 20%, be careful. Deleting in TiDB involves writing data, so don’t delete too much in a short period of time to avoid filling up the hard drive. After deleting, observe the disk space for a few days to see the release situation. If you want to know the space usage of a specific table in detail, you can check the INFORMATION_SCHEMA.TIKV_REGION_STATUS view, which contains t.APPROXIMATE_SIZE and t.APPROXIMATE_KEYS.
If there’s still 20% left, it’s quite risky. It’s recommended to scale up first. Deleting data is slow, and you need to delete a lot of data before automatic compaction can free up some space.
The table structure to be cleared occupies about 80% of the used space
To delete data from a table, the current data occupies about 80% of the used space , which means it occupies about 70% of the entire disk.
Then take it slowly, delete a portion each day, and continue deleting for a few days.
After deletion, if there is no GC, not only will the space not be released, but it may also affect query efficiency.
Higher versions have fixed issues related to MVCC versions. If you need to perform delete operations, you should consider this issue.
It’s not easy to upgrade the version now, as it’s a 24-hour operation. The best approach is to first scale out the nodes, and then proceed with the deletion after the expansion.
I’ll try it first, delete a part and see, while getting ready to apply for resource expansion.