Exporting TiDB Database Table Data by Time

translator_bot · June 22, 2024, 1:20pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb数据库表数据按时间导出

| username: 普罗米修斯

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.2.4
[Encountered Issues:]

Are there any tools to export database table data on a 3-month cycle and then import it back into the database? The purpose is to export part of the data from the production database and then import it into the development database for testing, all within the same TiDB instance.
How should one delete databases or tables with large amounts of data to avoid significant impact on business operations?

translator_bot · June 22, 2024, 1:20pm

| username: Jellybean | Original post link

There are actually many methods.

For small amounts of offline data, you can use Dumpling and Lightning to import. For large amounts of data, use BR, or use TiCDC for real-time synchronization.
It is recommended to use partitioned tables, partitioned by time. This allows you to delete historical data at any time with minimal impact on the cluster.

translator_bot · June 22, 2024, 1:20pm

| username: 普罗米修斯 | Original post link

Can duimping specify a 3-month export by time?

translator_bot · June 22, 2024, 1:20pm

| username: Jellybean | Original post link

My impression is that you can specify the WHERE condition to export data. If the data volume is not large, you can completely use mysqldump to export and import. We often do this in scenarios with millions or tens of millions of rows, as it is simple and straightforward when time is not a constraint.

translator_bot · June 22, 2024, 1:20pm

| username: 普罗米修斯 | Original post link

Okay, it seems that dumpling cannot specify the time. I’ll use mysqldump instead.

translator_bot · June 22, 2024, 1:20pm

| username: 普罗米修斯 | Original post link

How do I do this specifically? Please let me know.

translator_bot · June 22, 2024, 1:20pm

| username: Jellybean | Original post link

It’s actually very simple. Choose range partitioning (by time). You can refer to the official documentation for specific operations:

If you have any questions, feel free to discuss.

translator_bot · June 22, 2024, 1:20pm

| username: Jellybean | Original post link

You can specify the time for export, similar to mysqldump.

You can use --where <SQL where expression> to specify the records to be exported:

./dumpling -u root -P 4000 -h 127.0.0.1 -o /tmp/test --where "id < 100"

The above command will export the data from the table that meets the condition id < 100. If you want to use time, just replace it with the corresponding condition.

For more details, you can refer to the documentation:

translator_bot · June 22, 2024, 1:20pm

| username: Kongdom | Original post link

For this kind, I usually write Kettle scripts to loop and extract data by date.

translator_bot · June 22, 2024, 1:20pm

| username: Jellybean | Original post link

Yes, writing a script to periodically export is also very convenient.

In practice, most of the time I write a Flink stream for real-time synchronization, which allows me to control the data volume, concurrency, and QPS at any time. It’s very convenient.

translator_bot · June 22, 2024, 1:20pm

| username: liuis | Original post link

Write a script to export it yourself; these tools are developed in Go.

translator_bot · June 22, 2024, 1:20pm

| username: 胡杨树旁 | Original post link

You can add conditions like --where or --sql when exporting.

translator_bot · June 22, 2024, 1:20pm

| username: caiyfc | Original post link

In the same database, if the data volume is small, you can directly use insert into select, just add the time condition after where. If the data volume is large, you can use Dumpling to export, and you can add --where to filter the data.

translator_bot · June 22, 2024, 1:20pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.