Fast Batch Deletion of TiDB Data

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB数据快速批量删除

| username: Carl

How to quickly batch delete historical data, with roughly tens of millions of records each day.

| username: buddyyuan | Original post link

Refer to this

| username: tidb菜鸟一只 | Original post link

Create a partitioned table and directly truncate by partition…

| username: Carl | Original post link

Alas, the original table is not a partitioned table. Now, creating a new partitioned table and truncating it might require adjustments to the program’s query structure. Is there a simple command to operate this?

| username: Carl | Original post link

Is it a batch statement operation? I tried it and found that it needs a limit of 1000. Currently, when executing, the server CPU becomes abnormally high, and the coprocessor time is very long. Does TiKV also participate in the computation? Additionally, after deletion, since TiKV also uses the MVCC mechanism, the actual database disk space will not be released immediately. Generally, what command needs to be operated for TiDB to re-analyze the database and release the invalid storage space?

| username: TiDBer_pkQ5q1l0 | Original post link

I don’t think there’s a quick way. I’m deleting 3000 rows at a time based on the timestamp in the large table, and looping through to delete them.

| username: Carl | Original post link

Yes.

| username: realcp1018 | Original post link

Try this: realcp1018/tidb-toolkit. Initially, it provided two methods for deleting data: based on rowid and time column. These two methods have been used internally for several years. Recently, a more general method has been added that supports all types of TiDB tables.

| username: Hacker007 | Original post link

There doesn’t seem to be a good method for looping through IDs in small batches at the moment.