Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 老版本4.0.2版本delete之后的数据可以通过缩容清理空regin吗
Currently, there are two databases, 4.0.2 and 4.0.9, with data from around 2018 to the present. Now, when deleting data, it has been deleted up to June 2022. Basically, half of the data has been deleted, and it’s estimated that 4TB of data out of 8TB has been deleted. However, the disk space does not decrease. I noticed that in versions after 6.5, delete operations will gradually merge and reduce disk space. Versions below 6.5 do not have this feature. How can I save space now? I’m thinking, will directly reducing the size handle empty regions and save space?
DELETE, TRUNCATE, and DROP do not immediately release space. For TRUNCATE and DROP operations, TiDB’s GC (garbage collection) mechanism will delete data and release space after reaching the GC time (default is 10 minutes). For DELETE operations, TiDB’s GC mechanism will delete data but will not release space. Instead, the space will be reused when subsequent data is written to RocksDB and compaction occurs.
I haven’t used the old version of TiDB. I saw in the 4.0 documentation that you can manually compact TiKV. If your disk doesn’t release space after GC, you might consider manual compaction. This operation has a high IO load, so avoid using it during peak business hours.
Clearing space mainly relies on compaction; you can try manual compaction.
Give it a try, as the guy suggested.
I read in the documentation that before version 6.5, compact was about reusing space, not saving space. From version 6.5 onwards, compact is about saving space and significantly reducing disk usage.
Isn’t scaling down equivalent to changing from three 1TB to two 1TB? This can’t be considered as freeing up space, right?
Now we just want to save resources and reduce the number of machines. We plan to scale down from 3 servers with 1TB each to 2 servers. If we directly release empty regions, will the remaining 2 servers still have only 1TB of space and 3 replicas? The remaining 2 servers’ disk space won’t increase. Isn’t that considered releasing space?
Is it possible that there are many empty regions after deletion? In theory, empty regions should merge and reduce space.
Check the GC.
Another situation is that empty regions and adjacent regions are not on the same machine and cannot be merged. You can try
scheduler add shuffle-region-scheduler
to randomly shuffle and see if it helps. However, this can also cause performance jitter, so use it with caution.
Thank you all for your responses. Previously, I deleted data for a few days but didn’t see a reduction in storage. Then, I checked the official documentation for versions up to 6.5, which didn’t mention that delete would automatically shrink the storage, so I thought the disk space wouldn’t decrease. However, I just checked the monitoring and it seems the disk usage has decreased.
Currently, observing the 4.0.9 version of the database, the disk space has reduced somewhat, but not significantly. It’s just enough to downsize by one or two machines. For version 4.0.9, I noticed a trend over 15 days where the space decreased a bit. Previously, each server had an average of 1.3-1.4T, and now it’s between 800-1T.
From February 1st to February 13th, I was deleting nearly 200 million records daily, then stopped deleting. From the 22nd, I started deleting around 100 million records daily again. I’ll continue to observe the situation after I finish deleting all the data.
4 will also shrink back, especially if you delete and create empty regions. Compact will merge regions to save space.
Compact is a feature of RocksDB. In my tests with higher versions, delete operations also reclaim space, but the speed of this reclamation is measured in days, which is much slower than manual compaction.
Got it, understood. Previously, I was under the impression that delete does not release space. Thank you.
This is not urgent. I haven’t taken over someone else’s database for long. The immediate task is to clean up historical data and reduce costs. Minimize manual intervention as much as possible. Previous actions have already impacted the business a lot.
GC is responsible for managing this. Check if the configuration is appropriate.
Manual compaction should be achievable.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.