Data Import and Export

translator_bot · July 1, 2024, 8:13am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据导入导出

I need to export data from a large table A based on a where condition and then import it into an old table B. Do you have any solutions for this?
My idea is to use dumpling to export the specified SQL data to a CSV file and then use lightning to import it into table B, but the problem is that table B is not empty, and the import reports an error.

translator_bot · July 1, 2024, 8:13am

| username: MrSylar | Original post link

Does dumpling export results as SQL?

translator_bot · July 1, 2024, 8:13am

| username: 像风一样的男子 | Original post link

–sql mode does not support exporting to SQL,
Error:
create dumper failed: unsupported config.FileType ‘sql’ when we specify --sql, please unset --filetype or set it to ‘csv’

translator_bot · July 12, 2024, 3:51am

| username: MrSylar | Original post link

What is --sql? Should it be --filetype sql?

translator_bot · July 12, 2024, 3:51am

| username: 像风一样的男子 | Original post link

The image is not visible. Please provide the text you need translated.

translator_bot · July 12, 2024, 3:51am

| username: MrSylar | Original post link

Oh, got it. So, would replacing --sql with --where meet the requirements?

translator_bot · July 12, 2024, 3:52am

| username: 像风一样的男子 | Original post link

The structures of the two tables are different, and you need to export specific fields using --sql.

translator_bot · July 12, 2024, 3:52am

| username: MrSylar | Original post link

There are restrictions everywhere.

translator_bot · July 12, 2024, 3:52am

| username: 小龙虾爱大龙虾 | Original post link

Use Lightning to import into the empty table C, and then slowly insert (if there are fewer records, you can also insert the data from table B into table C).

translator_bot · July 12, 2024, 3:52am

| username: zhaokede | Original post link

Use the Kettle tool to import.

translator_bot · July 12, 2024, 3:52am

| username: 像风一样的男子 | Original post link

The data volume is quite large, with several years of historical records amounting to hundreds of millions. It’s not easy to handle.

translator_bot · July 12, 2024, 3:52am

| username: Kongdom | Original post link

If it’s within the same database, I usually just write an insert statement to insert the data. If it’s across different databases or heterogeneous, I generally use ETL tools like Kettle.

translator_bot · July 12, 2024, 3:52am

| username: zhh_912 | Original post link

Try rebuilding a new table.

translator_bot · July 12, 2024, 3:52am

| username: 像风一样的男子 | Original post link

We can only leave it to the R&D team to write the code for migration.

translator_bot · July 12, 2024, 3:52am

| username: WinterLiu | Original post link

Curve-saving the nation: first import into an empty table, then insert into the empty table. If the data volume is large, consider submitting in batches. It’s not a big problem.

translator_bot · July 12, 2024, 3:52am

| username: TiDBer_QKDdYGfz | Original post link

Continuously following, learning ETL tools from the experts.

translator_bot · July 12, 2024, 3:52am

| username: 有猫万事足 | Original post link

If the target table is not empty during import, TiDB Lightning will automatically report an error and exit; if parallel import mode (parallel-import = true) is enabled, this check will be skipped.

If there is indeed no problem with the imported data, I feel this can be resolved in the following way.
When performing incremental imports, choosing the physical mode might be a bit counterproductive because the physical mode will automatically perform checksum and analyze at the end. If there is a lot of historical data, these two operations might take longer than the import itself.

During incremental imports, I feel that the logical import mode is better.

translator_bot · July 12, 2024, 3:52am

| username: FutureDB | Original post link

If the field structure and order of the CSV file are consistent with Table B, you can use Lightning’s tidb mode, which essentially uses the replace into method for writing.

translator_bot · July 12, 2024, 3:52am

| username: MrSylar | Original post link

Has the final solution been decided?

translator_bot · July 12, 2024, 3:52am

| username: vincentLi | Original post link

Unloading data is very simple, just write an unload tool yourself. For loading, you can use load data.