Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb删除数据很慢
Deleting data is very slow, taking more than 10 seconds to delete each time. The table has 1 billion rows and an index built on timeKey. The deletion is very slow. Does anyone have a faster solution?
First, find the IDs, and after finding them:
delete table where id > 999 and id < 99999
Is the execution time of this SQL slow? How long does it take?
Write a Python script to delete table where id = 1 and delete table where id = 2, and start 10 threads. The speed will soar! I often do this!
Note: Monitor the CPU.
Deleting with “LIMIT 100000” will get slower and slower. Do not include it; use the WHERE condition instead.
The SQL statements to get the maximum and minimum IDs are very slow to execute, which is also a bottleneck.
One million entries took 191 seconds.
Paged deletion, the solution was provided by the previous poster. It can also avoid hotspot issues.
Is it seconds or milliseconds???
First, post the configuration of your TiDB cluster, including the deployment architecture and the configuration of the corresponding machines.
Also, don’t use Navicat for testing. I noticed there are issues with the time display. Use the MySQL client and test it directly.
Yes, insert the remaining data into another table, then drop the table, and finally rename it.
A script written by a community expert, take a look to see if it’s useful.
It can delete data in the millions without affecting the normal use of TiDB.
date1=date -d "1 day ago" +"%Y-%m-%d"
echo $date1
delete_db_sql=" delete from xxa where a.create_time < '2023-01-01 00:00:00' and a.intelligent_type = '2' limit 50000;"
echo $delete_db_sql
i=0
while ((++i)); do
a=$(/bin/mysql -uroot -pxxx -A xxx -h10.20.10.xx -P 4000 --comments -e "${delete_db_sql}" -vvv|grep "Query OK" |awk '{print $3}')
echo $a
if (($a<1)); then
break 1
fi
sleep 1
printf "%-4d" $((i))
done
How much data is left after deleting 1 billion records?
Gradually batch delete it, this scheduled task.
None of the databases are fast.
It looks like ms stands for milliseconds.