Solution for Migrating 200 Million Records to Partitioned Tables in TiDB

translator_bot · June 20, 2024, 1:33pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb两亿数据迁移到分区表的解决方案。

| username: TiDBer_W8QSBmpH

I want to know how to migrate a regular table with 200 million rows to a partitioned table in the same database. I have tried Flink SQL, but the query times out. Does anyone have any practical methods?

translator_bot · June 20, 2024, 1:33pm

| username: Jellybean | Original post link

Migrating such a large amount of data directly will inevitably involve large transactions and many limitations. You can use micro-batch SQL to split into smaller transactions for migration.

Alternatively, you can use Dumpling + Lightning to achieve concurrent export and import. For such large data, this method is also more recommended.

translator_bot · June 20, 2024, 1:33pm

| username: Kongdom | Original post link

Is it possible to split into multiple steps for migration based on partition dimensions? With the standard configuration, migrating 200 million records is not considered a lot.

translator_bot · June 20, 2024, 1:33pm

| username: tidb菜鸟一只 | Original post link

Then it’s still dumpling+lightning…

translator_bot · June 20, 2024, 1:33pm

| username: FutureDB | Original post link

Using Flink SQL for such a large amount of data is not a good solution, both in terms of execution efficiency and the impact on the database. It is recommended to follow the advice of the experts above and use Dumpling for export and Lightning for import. This method is relatively stable and much more efficient than Flink SQL. We have tried it before: exporting 200GB of data with Dumpling and then importing it with Lightning (Local mode), which took less than 2 hours.

translator_bot · June 20, 2024, 1:33pm

| username: ziptoam | Original post link

Indeed, using the dumpling+lightning method is more appropriate.

translator_bot · June 20, 2024, 1:33pm

| username: 鱼跃龙门 | Original post link

Using dumpling+lightning method for concurrent export and import.

translator_bot · June 20, 2024, 1:33pm

| username: TiDBer_rvITcue9 | Original post link

Use dumpling + lightning to achieve concurrent export and import.

translator_bot · June 20, 2024, 1:33pm

| username: 小于同学 | Original post link

dumpling+lightning

translator_bot · June 20, 2024, 1:33pm

| username: zhanggame1 | Original post link

I just removed the partition table. The benefits of using partition tables in TiDB are basically limited to deleting partitions, and there are many bugs.

translator_bot · June 20, 2024, 1:33pm

| username: vincentLi | Original post link

Selecting data from the source table in batches according to the partition key and inserting it into the target table should be the fastest as it uses memory. If you’re worried about large transactions, consider using a stored procedure for the migration.

translator_bot · June 20, 2024, 1:33pm

| username: zhaokede | Original post link

The TiDB community edition has never been stored.

translator_bot · June 20, 2024, 1:33pm

| username: TiDBer_HUfcQIJx | Original post link

dumpling+lightning

translator_bot · June 20, 2024, 1:33pm

| username: zhaokede | Original post link

200 million records, with an estimated size within 100GB, can barely use Dumpling.

translator_bot · June 20, 2024, 1:33pm

| username: zhanggame1 | Original post link

The simplest way is to split multiple WHERE clauses by primary key range, then use INSERT INTO SELECT * FROM the old table. 200 million is not considered large.

translator_bot · June 20, 2024, 1:33pm

| username: vincentLi | Original post link

Oh, I didn’t see that he mentioned it was the community edition…

translator_bot · June 20, 2024, 1:33pm

| username: Kongdom | Original post link

The default habit is usually the community edition~ For the enterprise edition, you generally contact the original manufacturer for support.