[Urgent Help Needed] Will Dropping a 12TB Table in Production Environment TIDB Affect Business Operations?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【紧急求助】生产环境 TIDB DROP删除一个12T的大表对业务有没有影响?

| username: 苏半生Su

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4
Will dropping a 12T large table affect the business?
Is there a way to drop large tables using hard links similar to MySQL?

| username: TiDBer_小阿飞 | Original post link

There will definitely be an impact because of the GC mechanism and MVCC.

If you need to delete a large number of rows (tens of thousands or more), it is recommended to use an iterative approach, deleting only a portion of the data each time until the deletion is complete. This is because TiDB has a single transaction size limit of txn-total-size-limit (default is 100MB). You can use a loop in your program or script to accomplish this.

| username: 像风一样的男子 | Original post link

Deleting 12TB of data, are you crazy? Doing this will cause significant database fluctuations.

| username: h5n1 | Original post link

The drop operation will clean up SST files after reaching the GC time, which may cause some IO. After that, there will be a large number of empty regions, and PD will schedule region merges, consuming some resources. The overall impact is not significant.

| username: 像风一样的男子 | Original post link

Theoretically, it has no impact. It is recommended to drop during off-peak business hours.

| username: 路在何chu | Original post link

He is drop

| username: TiDBer_小阿飞 | Original post link

If you need to delete all data in a table, do not use the DELETE statement; instead, you should use the TRUNCATE statement.

| username: 像风一样的男子 | Original post link

I agree with the suggestion from the previous post to delete it.

| username: 扬仔_tidb | Original post link

Things that must be done, must be done. Execute during off-peak hours. However, performing GC on TiKV will have a performance impact.

| username: 路在何chu | Original post link

There should be a difference between tidb.drop and truncate, right?

| username: TiDBer_小阿飞 | Original post link

Isn’t there also non-transactional DML? :joy:

| username: 随缘天空 | Original post link

It is not recommended to perform such a large operation at once. You can set up a timer to delete asynchronously with multiple threads at a fixed time every day at midnight. Once the data volume decreases, then execute the truncate table operation.

| username: 随缘天空 | Original post link

There should be an impact. Deleting tables and the data within them requires CPU and memory resources after all.

| username: 苏半生Su | Original post link

If you need to delete a large number of rows (tens of thousands or more), it is recommended to drop the table.

| username: 苏半生Su | Original post link

Dropping a table instead of deleting it, will this also cause GC and IO to freeze?

| username: 像风一样的男子 | Original post link

When executing the DROP TABLE operation, a large amount of continuous data will be deleted. TiDB will record these ranges to be deleted and the timestamps of the delete operations, but it will not immediately perform a physical deletion. During the GC phase, Delete Ranges will quickly perform physical deletion on the ranges with timestamps before the safe point. Physical deletion mainly consumes disk I/O and has little impact on CPU and memory.

| username: 苏半生Su | Original post link

Is the drop a sudden spike in disk IO, or a slow drop? If the IO directly hits 100%, wouldn’t the business be done for?

| username: 像风一样的男子 | Original post link

I have only dropped a table with over 10 billion rows without any issues. Your 12TB of data is too large, and no one has done it before, so I can’t guarantee anything. Therefore, I suggest you do it during a business downtime.

| username: zhanggame1 | Original post link

It would take a year to delete all of it. If you don’t believe me, test it yourself. With 100 million records, it starts deleting super fast, but then it gets slower and slower, from tens of milliseconds to several seconds each time.

| username: zhanggame1 | Original post link

I strongly advise against deleting data. The continuous delete performance of TiDB is very poor and will get slower and slower. Additionally, the amount of data on the hard disk will surge, potentially filling up the hard disk very quickly.