How to migrate a table with 1 billion rows in TiDB to another partitioned table with the same fields in the same database

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 怎样把tidb单表10亿数据量的表迁移到同库另外一张相同字段的分区表中

| username: TiDBer_FMWXa7ja

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】6.5
【Reproduction Path】
【Encountered Problem: Problem Phenomenon and Impact】There is an existing table with 1 billion records in TiDB without partitions, and I want to migrate it to another table with the same fields but with partitions.
【Resource Configuration】
【Attachments: Screenshots/Logs/Monitoring】

| username: xfworld | Original post link

Two points need to be considered:

  1. The transaction is too large and needs to be split.
  2. Execution timeout may cause rollback.

Additionally, you can refer to:


It is recommended to read and insert in time segments to reduce transaction size and maintain consistency.

| username: 裤衩儿飞上天 | Original post link

Dumpling logical migration

| username: tidb菜鸟一只 | Original post link

I also suggest migrating dumpling lighting.

| username: TiDBer_FMWXa7ja | Original post link

Dumpling exports data. The official website video says it generally handles small tables, within tens of gigabytes. I’m not sure if it can handle tables of several hundred gigabytes.

| username: Kongdom | Original post link

You can insert directly by partition field in segments using “insert into xxx select * from tb where partition condition”.

| username: TiDBer_FMWXa7ja | Original post link

This method is feasible, but our leader wants to use existing tools because this method is inefficient and requires script maintenance.

| username: Kongdom | Original post link

:thinking: What maintenance script? Can’t it be generated in batches? For scripts like this where only the conditions are different, I use the concat method to generate them in batches and then throw them into the query analyzer for execution.

Of course, if you regularly use the tools recommended above, you can directly use those tools. Besides the tools mentioned above, I also recommend using ETL tools like Kettle.

| username: 胡杨树旁 | Original post link

If you’re concerned about the risk of large data volumes, you can add conditions to export data when using Dumpling.

| username: TiDBer_FMWXa7ja | Original post link

Could you provide an example of generating in bulk using concat? How do you commit? Does each statement commit automatically?

| username: Kongdom | Original post link

This is a statement to generate inserts by date range

select 
	concat("insert into target_table select * from source_table where c_datetime >= '",datelist,"' and c_datetime < date_add('",datelist,"', INTERVAL 1 day);") 
from calendar
where datelist < '2021-01-11'
order by datelist

calendar is a calendar table, and there are many methods online to generate a calendar table.

The statements are automatically committed without additional transactions. We generally use this when initializing data switching, so it does not involve transaction consistency issues.

| username: Kongdom | Original post link

By the way, there is also a simple method. If you were using MySQL before, you can directly use mysqldump for data export and import. After all, mysqldump comes with MySQL and does not require additional installation.

| username: 裤衩儿飞上天 | Original post link

300G should still be tolerable. If mysqldump can handle it, then dumpling and lightning should have no problem at all.

| username: Hacker007 | Original post link

You can try using Navicat’s data transfer feature, but it might not work for 1 billion records.

| username: Kongdom | Original post link

:wink: Navicat has copyright issues, and the company has already requested that it not be used.

| username: TiDBer_FMWXa7ja | Original post link

Can lightning import specify the target database name and table name? Because the insert statements exported by dumpling have table names, I want to change the table name.

| username: 裤衩儿飞上天 | Original post link

lightning does not support route

  1. Manually change the table names in the export file
  2. Configure route according to the method mentioned below
| username: 胡杨树旁 | Original post link

[[routes]]
schema-pattern = “"
table-pattern = "

target-schema = “"
target-table = "
” It should be supported. The above is the source end, and the below is the target end. Just map it.

| username: TiDBer_FMWXa7ja | Original post link

Let me ask, after generating the insert statement with concat, how do you trigger its execution in TiDB? TiDB does not have stored procedures or triggers.

| username: Jellybean | Original post link

We have migrated a table with over 2 billion rows, similar to the original poster’s situation, converting a regular table to a partitioned table to facilitate the deletion of historical data later on.

  • We use Flink quite frequently here, so I directly used Flink for synchronization, achieving a QPS of around 50,000, making the synchronization very fast. The downside is that the learning curve is steep.
  • If you are not familiar with Flink or have not used it before, it is recommended to use tools for the migration. One good tool is dumpling + lightning, which supports multi-threaded export and import, making it quite efficient. The downside is that you need to spend some time reading the documentation. Additionally, lightning supports configuring routing, meaning you can import data into tables with different names. Other experts have already provided specific operational methods, so I won’t elaborate further here.
  • If you have ample time, for example, if you can spend a few days slowly importing, you can use the most basic method of exporting with mysqldump, then use the sed command to change the table names, and then import the data. The advantage of this method is its simplicity and ease of operation; the downside is that it is single-threaded for export and import, making it relatively slow. Exporting should be relatively quick; I have used mysqldump to export an 800GB table in a few hours, and importing a SQL file with 1 billion rows of data is estimated to take 2-3 days. You can just set up a background task and let it run slowly.

Which solution to use depends on the actual situation.