Solution for Migrating 200 Million Records to Partitioned Tables in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb两亿数据迁移到分区表的解决方案。

| username: TiDBer_W8QSBmpH

I want to know how to migrate a regular table with 200 million rows to a partitioned table in the same database. I have tried Flink SQL, but the query times out. Does anyone have any practical methods?

| username: Jellybean | Original post link

Migrating such a large amount of data directly will inevitably involve large transactions and many limitations. You can use micro-batch SQL to split into smaller transactions for migration.

Alternatively, you can use Dumpling + Lightning to achieve concurrent export and import. For such large data, this method is also more recommended.

| username: Kongdom | Original post link

Is it possible to split into multiple steps for migration based on partition dimensions? With the standard configuration, migrating 200 million records is not considered a lot.

| username: tidb菜鸟一只 | Original post link

Then it’s still dumpling+lightning…

| username: FutureDB | Original post link

Using Flink SQL for such a large amount of data is not a good solution, both in terms of execution efficiency and the impact on the database. It is recommended to follow the advice of the experts above and use Dumpling for export and Lightning for import. This method is relatively stable and much more efficient than Flink SQL. We have tried it before: exporting 200GB of data with Dumpling and then importing it with Lightning (Local mode), which took less than 2 hours.

| username: ziptoam | Original post link

Indeed, using the dumpling+lightning method is more appropriate.

| username: 鱼跃龙门 | Original post link

Using dumpling+lightning method for concurrent export and import.

| username: TiDBer_rvITcue9 | Original post link

Use dumpling + lightning to achieve concurrent export and import.

| username: 小于同学 | Original post link

dumpling+lightning

| username: zhanggame1 | Original post link

I just removed the partition table. The benefits of using partition tables in TiDB are basically limited to deleting partitions, and there are many bugs.

| username: vincentLi | Original post link

Selecting data from the source table in batches according to the partition key and inserting it into the target table should be the fastest as it uses memory. If you’re worried about large transactions, consider using a stored procedure for the migration.

| username: zhaokede | Original post link

The TiDB community edition has never been stored.

| username: TiDBer_HUfcQIJx | Original post link

dumpling+lightning

| username: zhaokede | Original post link

200 million records, with an estimated size within 100GB, can barely use Dumpling.

| username: zhanggame1 | Original post link

The simplest way is to split multiple WHERE clauses by primary key range, then use INSERT INTO SELECT * FROM the old table. 200 million is not considered large.

| username: vincentLi | Original post link

Oh, I didn’t see that he mentioned it was the community edition…

| username: Kongdom | Original post link

:yum: The default habit is usually the community edition~ For the enterprise edition, you generally contact the original manufacturer for support.