Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb数据库delete数据,存在mysql那样高水位的问题吗
There is no operation similar to “ALTER TABLE xx ENGINE=InnoDB;” in MySQL to rebuild the table and reclaim space.
Just delete it directly. It will be automatically reclaimed and compacted during GC to free up space, no need to rebuild.
The default GC time is 10 minutes. After deletion, multiple versions are saved according to the MVCC mechanism. Space is released after merging and deleting during the GC event.
DELETE, TRUNCATE, and DROP do not immediately release space. For TRUNCATE and DROP operations, after reaching TiDB’s GC (garbage collection) time (default 10 minutes), TiDB’s GC mechanism will delete the data and release the space. For DELETE operations, TiDB’s GC mechanism will delete the data but will not release the space. Instead, the space will be reused when subsequent data is written to RocksDB and compaction occurs.
The space reclamation operation relies on TiDB’s own GC mechanism.
Just delete it, using Navicat will be faster.
The underlying storage implementations of the two databases are different, and there is no concept of a high watermark for blocks. However, there are similar issues where data deleted is not immediately freed up. It needs to wait for GC, and GC only marks it for deletion. At the underlying level, RocksDB marks it as a tombstone, and space is freed up after compaction. TiKV has optimized a GC in compaction filter, which actively performs GC during compaction to delete data and free up space. Additionally, there is a threshold setting on regions that triggers region compaction when the number of deletions reaches a certain level.
TiDB has garbage collection (GC).
TiDB is also a stacked approach.
Yes, after using delete, the occupied disk space will not be released, but truncate and drop will release it.
As a DBA who switched careers halfway, I had never heard of the high water mark, so I specifically looked it up. But it doesn’t seem to have much to do with delete operations, right? 
The high water mark in a MySQL database refers to the issue where the amount of data in the database reaches a preset maximum value, causing a decline in database performance. As the amount of data in a MySQL database gradually increases, the indexes and data files in the database become larger and larger. If the preset maximum value is reached, a high water mark phenomenon occurs. This will lead to a decline in MySQL database performance and may result in data loss.
In a MySQL database, data is stored in units called pages. Each page is typically 16KB in size. When data is inserted or updated, the MySQL storage engine writes the data into the free space within the page. If a page is filled up, the storage engine will allocate a new page to store the data and raise the high water mark to the position of the new page.
The high water mark is not a specific position but an indicator of the used space in the current data file. When the high water mark rises, the free space in the data file decreases, leading to wasted disk space.
Watermark refers to a metric that can only increase and not decrease, even if all the data is deleted.
Learned something new, I hadn’t heard this saying before.
The terminology may vary, but they are roughly the same.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.