Can data deleted in the old version 4.0.2 be cleaned up by shrinking empty regions?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 老版本4.0.2版本delete之后的数据可以通过缩容清理空regin吗

| username: 舞动梦灵

Currently, there are two databases, 4.0.2 and 4.0.9, with data from around 2018 to the present. Now, when deleting data, it has been deleted up to June 2022. Basically, half of the data has been deleted, and it’s estimated that 4TB of data out of 8TB has been deleted. However, the disk space does not decrease. I noticed that in versions after 6.5, delete operations will gradually merge and reduce disk space. Versions below 6.5 do not have this feature. How can I save space now? I’m thinking, will directly reducing the size handle empty regions and save space?

| username: Hacker007 | Original post link

DELETE, TRUNCATE, and DROP do not immediately release space. For TRUNCATE and DROP operations, TiDB’s GC (garbage collection) mechanism will delete data and release space after reaching the GC time (default is 10 minutes). For DELETE operations, TiDB’s GC mechanism will delete data but will not release space. Instead, the space will be reused when subsequent data is written to RocksDB and compaction occurs.

| username: zhanggame1 | Original post link

I haven’t used the old version of TiDB. I saw in the 4.0 documentation that you can manually compact TiKV. If your disk doesn’t release space after GC, you might consider manual compaction. This operation has a high IO load, so avoid using it during peak business hours.

| username: tidb菜鸟一只 | Original post link

Clearing space mainly relies on compaction; you can try manual compaction.

| username: forever | Original post link

Give it a try, as the guy suggested.

| username: 舞动梦灵 | Original post link

I read in the documentation that before version 6.5, compact was about reusing space, not saving space. From version 6.5 onwards, compact is about saving space and significantly reducing disk usage.

| username: Miracle | Original post link

Isn’t scaling down equivalent to changing from three 1TB to two 1TB? This can’t be considered as freeing up space, right?

| username: 舞动梦灵 | Original post link

Now we just want to save resources and reduce the number of machines. We plan to scale down from 3 servers with 1TB each to 2 servers. If we directly release empty regions, will the remaining 2 servers still have only 1TB of space and 3 replicas? The remaining 2 servers’ disk space won’t increase. Isn’t that considered releasing space?

| username: TiDBer_jYQINSnf | Original post link

Is it possible that there are many empty regions after deletion? In theory, empty regions should merge and reduce space.

Check the GC.

Another situation is that empty regions and adjacent regions are not on the same machine and cannot be merged. You can try
scheduler add shuffle-region-scheduler
to randomly shuffle and see if it helps. However, this can also cause performance jitter, so use it with caution.

| username: 舞动梦灵 | Original post link

Thank you all for your responses. Previously, I deleted data for a few days but didn’t see a reduction in storage. Then, I checked the official documentation for versions up to 6.5, which didn’t mention that delete would automatically shrink the storage, so I thought the disk space wouldn’t decrease. However, I just checked the monitoring and it seems the disk usage has decreased.

Currently, observing the 4.0.9 version of the database, the disk space has reduced somewhat, but not significantly. It’s just enough to downsize by one or two machines. For version 4.0.9, I noticed a trend over 15 days where the space decreased a bit. Previously, each server had an average of 1.3-1.4T, and now it’s between 800-1T.

From February 1st to February 13th, I was deleting nearly 200 million records daily, then stopped deleting. From the 22nd, I started deleting around 100 million records daily again. I’ll continue to observe the situation after I finish deleting all the data.

| username: tidb菜鸟一只 | Original post link

4 will also shrink back, especially if you delete and create empty regions. Compact will merge regions to save space.

| username: zhanggame1 | Original post link

Compact is a feature of RocksDB. In my tests with higher versions, delete operations also reclaim space, but the speed of this reclamation is measured in days, which is much slower than manual compaction.

| username: 舞动梦灵 | Original post link

Got it, understood. Previously, I was under the impression that delete does not release space. Thank you.

| username: 舞动梦灵 | Original post link

This is not urgent. I haven’t taken over someone else’s database for long. The immediate task is to clean up historical data and reduce costs. Minimize manual intervention as much as possible. Previous actions have already impacted the business a lot.

| username: dba远航 | Original post link

GC is responsible for managing this. Check if the configuration is appropriate.

| username: WinterLiu | Original post link

Manual compaction should be achievable.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.