Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 数据导入导出
I need to export data from a large table A based on a where condition and then import it into an old table B. Do you have any solutions for this?
My idea is to use dumpling to export the specified SQL data to a CSV file and then use lightning to import it into table B, but the problem is that table B is not empty, and the import reports an error.
Does dumpling export results as SQL?
–sql mode does not support exporting to SQL,
Error:
create dumper failed: unsupported config.FileType ‘sql’ when we specify --sql, please unset --filetype or set it to ‘csv’
What is --sql? Should it be --filetype sql?
The image is not visible. Please provide the text you need translated.
Oh, got it. So, would replacing --sql with --where meet the requirements?
The structures of the two tables are different, and you need to export specific fields using --sql.
There are restrictions everywhere.
Use Lightning to import into the empty table C, and then slowly insert (if there are fewer records, you can also insert the data from table B into table C).
Use the Kettle tool to import.
The data volume is quite large, with several years of historical records amounting to hundreds of millions. It’s not easy to handle.
If it’s within the same database, I usually just write an insert statement to insert the data. If it’s across different databases or heterogeneous, I generally use ETL tools like Kettle.
Try rebuilding a new table.
We can only leave it to the R&D team to write the code for migration.
Curve-saving the nation: first import into an empty table, then insert into the empty table. If the data volume is large, consider submitting in batches. It’s not a big problem.
Continuously following, learning ETL tools from the experts.
If the target table is not empty during import, TiDB Lightning will automatically report an error and exit; if parallel import mode (parallel-import = true) is enabled, this check will be skipped.
If there is indeed no problem with the imported data, I feel this can be resolved in the following way.
When performing incremental imports, choosing the physical mode might be a bit counterproductive because the physical mode will automatically perform checksum and analyze at the end. If there is a lot of historical data, these two operations might take longer than the import itself.
During incremental imports, I feel that the logical import mode is better.
If the field structure and order of the CSV file are consistent with Table B, you can use Lightning’s tidb mode, which essentially uses the replace into method for writing.
Has the final solution been decided?
Unloading data is very simple, just write an unload tool yourself. For loading, you can use load data.