[TiDB Usage Environment] Production Environment
[TiDB Version] 5.2.4
[Encountered Issues:]
Are there any tools to export database table data on a 3-month cycle and then import it back into the database? The purpose is to export part of the data from the production database and then import it into the development database for testing, all within the same TiDB instance.
How should one delete databases or tables with large amounts of data to avoid significant impact on business operations?
For small amounts of offline data, you can use Dumpling and Lightning to import. For large amounts of data, use BR, or use TiCDC for real-time synchronization.
It is recommended to use partitioned tables, partitioned by time. This allows you to delete historical data at any time with minimal impact on the cluster.
My impression is that you can specify the WHERE condition to export data. If the data volume is not large, you can completely use mysqldump to export and import. We often do this in scenarios with millions or tens of millions of rows, as it is simple and straightforward when time is not a constraint.
The above command will export the data from the table that meets the condition id < 100. If you want to use time, just replace it with the corresponding condition.
For more details, you can refer to the documentation:
Yes, writing a script to periodically export is also very convenient.
In practice, most of the time I write a Flink stream for real-time synchronization, which allows me to control the data volume, concurrency, and QPS at any time. It’s very convenient.
In the same database, if the data volume is small, you can directly use insert into select, just add the time condition after where. If the data volume is large, you can use Dumpling to export, and you can add --where to filter the data.