Dropping a table does not physically delete the corresponding data

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 删除表,对应的数据没有被物理删除。

| username: 泰迪比爱好者

As mentioned, I deleted a table, but its corresponding data was not physically deleted. Is there a similar recycle bin mechanism? How should I delete the corresponding data?

| username: Miracle | Original post link

There is GC, with a default of ten minutes. I’ll check again later.

| username: 泰迪比爱好者 | Original post link

I’ve been waiting all night, and it still hasn’t been deleted.

| username: Miracle | Original post link

Take a look at the GC configuration:

SHOW VARIABLES LIKE '%tidb_gc_life_time%';
| username: 泰迪比爱好者 | Original post link

This is 10 minutes.

| username: TiDBer_小阿飞 | Original post link

That’s how MVCC works.

| username: 泰迪比爱好者 | Original post link

I didn’t understand what you meant, could you explain it in more detail?

| username: Fly-bird | Original post link

MVCC mechanism

| username: 泰迪比爱好者 | Original post link

However, I want to delete the data now. What should I do? Otherwise, there will be several terabytes of junk data.

| username: zhanggame1 | Original post link

Check this to see the actual execution status of GC.

| username: Kongdom | Original post link

This should be deleted by the GC mechanism.

| username: 泰迪比爱好者 | Original post link

However, from my observation, it hasn’t been deleted even after an entire night.

| username: 路在何chu | Original post link

It seems like there’s no other way, we can only wait for the GC to clean it up.

| username: 路在何chu | Original post link

SELECT * FROM mysql.tidb WHERE variable_name =‘tikv_gc_safe_point’; You can check up to which point in time it has been cleaned.

| username: Kongdom | Original post link

Refer to this for troubleshooting:

| username: TiDBer_小阿飞 | Original post link

TiDB’s transaction implementation uses the MVCC (Multi-Version Concurrency Control) mechanism. When new data overwrites old data, the old data is not replaced but is retained alongside the new data, with timestamps used to distinguish between versions.

| username: TiDBer_小阿飞 | Original post link

The way TiDB handles the delete statement is by marking it for deletion. The deletion itself is actually inserting a kv record, but the value becomes delete. The real data is eventually deleted through logical GC and compaction. Therefore, when executing the delete statement in a loop, each time n records are deleted, the next delete statement will scan +n keys, and the execution time will become longer and longer (you can do an experiment and observe the slow log file, the total keys of the same delete statement will keep increasing).

So, how to efficiently delete & archive records before a specific date?

First, we know that TiDB has limitations on transaction size:

  1. A single transaction contains no more than 5000 SQL statements.
  2. The size of a single record operated on does not exceed 6MB.
  3. The total keys operated on by the transaction do not exceed 300,000.
  4. The total size of all records operated on by the transaction does not exceed 100MB.

Due to TiDB’s transaction limitations and the implementation principle of TiDB MVCC, there is currently no good method to delete & archive data within a specific range. Here are some personal insights for your reference:

First method:
Try to reduce the granularity of the deletion range, such as segmenting the data by minute in advance, enabling tidb_batch_delete, and increasing concurrency to delete. Pay attention to using open and closed intervals, and avoid conflicts between segments, as the cost of resolving transaction conflicts in TiDB is relatively high.

set @@session.tidb_batch_delete=1;
delete from table where create_time > '$start_step' and create_time <= '$end_step';

If the data within the segment exceeds the transaction size limit, TiDB will automatically split the delete operation into multiple batches. Based on personal experience, this method of deleting data is relatively fast.

Second method:
Partition the table by date and delete the expired tables. TiDB can delete tables in seconds, and subsequent space recovery is also relatively fast. The downside is that it is intrusive to the business. Each method has its pros and cons, so you can choose according to your needs.

| username: dba-kit | Original post link

Let’s take a look at this first. Transactions that have not been released will also prevent automatic GC.

| username: 随缘天空 | Original post link

I encountered a similar issue before. After deletion, TiDB’s default garbage collection time is 10 minutes, and then it automatically releases the space. If it still doesn’t, you should check whether the GC mechanism is turned off or what the set time is. If you have similar needs in the future, you can use the TRUNCATE TABLE statement to clear all data in the table. This statement will immediately release the storage space occupied by the table and mark all data rows as deleted.