Exporting TiDB Database Table Data by Time

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb数据库表数据按时间导出

| username: 普罗米修斯

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.2.4
[Encountered Issues:]

  1. Are there any tools to export database table data on a 3-month cycle and then import it back into the database? The purpose is to export part of the data from the production database and then import it into the development database for testing, all within the same TiDB instance.
  2. How should one delete databases or tables with large amounts of data to avoid significant impact on business operations?
| username: Jellybean | Original post link

There are actually many methods.

  1. For small amounts of offline data, you can use Dumpling and Lightning to import. For large amounts of data, use BR, or use TiCDC for real-time synchronization.

  2. It is recommended to use partitioned tables, partitioned by time. This allows you to delete historical data at any time with minimal impact on the cluster.

| username: 普罗米修斯 | Original post link

Can duimping specify a 3-month export by time?

| username: Jellybean | Original post link

My impression is that you can specify the WHERE condition to export data. If the data volume is not large, you can completely use mysqldump to export and import. We often do this in scenarios with millions or tens of millions of rows, as it is simple and straightforward when time is not a constraint.

| username: 普罗米修斯 | Original post link

Okay, it seems that dumpling cannot specify the time. I’ll use mysqldump instead.

| username: 普罗米修斯 | Original post link

How do I do this specifically? Please let me know.

| username: Jellybean | Original post link

It’s actually very simple. Choose range partitioning (by time). You can refer to the official documentation for specific operations:

If you have any questions, feel free to discuss.

| username: Jellybean | Original post link

You can specify the time for export, similar to mysqldump.

You can use --where <SQL where expression> to specify the records to be exported:

./dumpling -u root -P 4000 -h 127.0.0.1 -o /tmp/test --where "id < 100"

The above command will export the data from the table that meets the condition id < 100. If you want to use time, just replace it with the corresponding condition.

For more details, you can refer to the documentation:

| username: Kongdom | Original post link

For this kind, I usually write Kettle scripts to loop and extract data by date.

| username: Jellybean | Original post link

Yes, writing a script to periodically export is also very convenient.

In practice, most of the time I write a Flink stream for real-time synchronization, which allows me to control the data volume, concurrency, and QPS at any time. It’s very convenient.

| username: liuis | Original post link

Write a script to export it yourself; these tools are developed in Go.

| username: 胡杨树旁 | Original post link

You can add conditions like --where or --sql when exporting.

| username: caiyfc | Original post link

In the same database, if the data volume is small, you can directly use insert into select, just add the time condition after where. If the data volume is large, you can use Dumpling to export, and you can add --where to filter the data.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.