TiDB is slow in deleting data

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb删除数据很慢

| username: 奋斗的大象

Deleting data is very slow, taking more than 10 seconds to delete each time. The table has 1 billion rows and an index built on timeKey. The deletion is very slow. Does anyone have a faster solution?

| username: tidb狂热爱好者 | Original post link

| username: tidb狂热爱好者 | Original post link

First, find the IDs, and after finding them:

delete table where id > 999 and id < 99999

| username: 考试没答案 | Original post link

Is the execution time of this SQL slow? How long does it take?

| username: 考试没答案 | Original post link

Write a Python script to delete table where id = 1 and delete table where id = 2, and start 10 threads. The speed will soar! I often do this!
Note: Monitor the CPU.

| username: zhanggame1 | Original post link

Deleting with “LIMIT 100000” will get slower and slower. Do not include it; use the WHERE condition instead.

| username: 考试没答案 | Original post link

  1. Does this table have a primary key?
  2. How long does it take to execute select primary_key_id from deltable where timekey < '2023-01-01';?
  3. The Python script retrieves the primary key IDs from the above query and deletes them in parallel using delete from deltable where id = 111. Can you do it in Python?
| username: 奋斗的大象 | Original post link

The SQL statements to get the maximum and minimum IDs are very slow to execute, which is also a bottleneck.

| username: 奋斗的大象 | Original post link

10,000 rows

| username: 奋斗的大象 | Original post link

One million entries took 191 seconds.

| username: Soysauce520 | Original post link

Paged deletion, the solution was provided by the previous poster. It can also avoid hotspot issues.

| username: 考试没答案 | Original post link

Is it seconds or milliseconds???

| username: CuteRay | Original post link

First, post the configuration of your TiDB cluster, including the deployment architecture and the configuration of the corresponding machines.

Also, don’t use Navicat for testing. I noticed there are issues with the time display. Use the MySQL client and test it directly.

| username: TiDBer_5cwU0ltE | Original post link

Yes, insert the remaining data into another table, then drop the table, and finally rename it.

| username: 像风一样的男子 | Original post link

A script written by a community expert, take a look to see if it’s useful.

It can delete data in the millions without affecting the normal use of TiDB.

date1=date -d "1 day ago" +"%Y-%m-%d"
echo $date1
delete_db_sql=" delete from xxa where a.create_time < '2023-01-01 00:00:00' and a.intelligent_type = '2' limit 50000;"
echo $delete_db_sql
i=0

while ((++i)); do
    a=$(/bin/mysql -uroot -pxxx -A  xxx -h10.20.10.xx -P 4000 --comments -e "${delete_db_sql}" -vvv|grep "Query OK" |awk '{print $3}')
    echo $a
    if (($a<1)); then
        break 1
    fi
sleep 1
printf "%-4d" $((i))
done
| username: 连连看db | Original post link

Try this: realcp1018/tidb-toolkit

| username: 这里介绍不了我 | Original post link

How much data is left after deleting 1 billion records?

| username: YuchongXU | Original post link

Gradually batch delete it, this scheduled task.

| username: redgame | Original post link

None of the databases are fast.

| username: DBAER | Original post link

It looks like ms stands for milliseconds.