Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb两亿数据迁移到分区表的解决方案。
I want to know how to migrate a regular table with 200 million rows to a partitioned table in the same database. I have tried Flink SQL, but the query times out. Does anyone have any practical methods?
Migrating such a large amount of data directly will inevitably involve large transactions and many limitations. You can use micro-batch SQL to split into smaller transactions for migration.
Alternatively, you can use Dumpling + Lightning to achieve concurrent export and import. For such large data, this method is also more recommended.
Is it possible to split into multiple steps for migration based on partition dimensions? With the standard configuration, migrating 200 million records is not considered a lot.
Then it’s still dumpling+lightning…
Using Flink SQL for such a large amount of data is not a good solution, both in terms of execution efficiency and the impact on the database. It is recommended to follow the advice of the experts above and use Dumpling for export and Lightning for import. This method is relatively stable and much more efficient than Flink SQL. We have tried it before: exporting 200GB of data with Dumpling and then importing it with Lightning (Local mode), which took less than 2 hours.
Indeed, using the dumpling+lightning method is more appropriate.
Using dumpling+lightning method for concurrent export and import.
Use dumpling + lightning to achieve concurrent export and import.
I just removed the partition table. The benefits of using partition tables in TiDB are basically limited to deleting partitions, and there are many bugs.
Selecting data from the source table in batches according to the partition key and inserting it into the target table should be the fastest as it uses memory. If you’re worried about large transactions, consider using a stored procedure for the migration.
The TiDB community edition has never been stored.
200 million records, with an estimated size within 100GB, can barely use Dumpling.
The simplest way is to split multiple WHERE clauses by primary key range, then use INSERT INTO SELECT * FROM the old table. 200 million is not considered large.
Oh, I didn’t see that he mentioned it was the community edition…
The default habit is usually the community edition~ For the enterprise edition, you generally contact the original manufacturer for support.