Can TiDB Lightning Skip Specified Columns When Importing Data?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb lightning 导入数据可以跳过指定列吗

| username: TiDBer_ICjSIGXz

[TiDB Usage Environment] Testing
[TiDB Version] 6.5.3
Migrating data from Oracle database to TiDB, with table data volume in the tens of millions.
Source database: Oracle 11g
Target database: TiDB 6.5.3
The target database has already created tables, with an additional auto-increment column compared to the source database.
The current method is to add an empty column during export, which allows data loading.
I would like to inquire if it is possible to specify skipping certain columns of the target database’s table when importing with Lightning.

| username: 春风十里 | Original post link

Couldn’t find such a feature.

| username: dba远航 | Original post link

tiCDC can implement filtering

| username: 大飞哥online | Original post link

Use the skip-column parameter in the configuration file to specify the columns to skip.

| username: 大飞哥online | Original post link

column1 and column2 are the names of the columns to be skipped.

| username: hey-hoho | Original post link

No, refer to the detailed configuration file of Lightning:

| username: 小龙虾爱大龙虾 | Original post link

Just include the column headers when exporting, and it will naturally import according to the columns.

| username: TiDB_C罗 | Original post link

Tested and effective. The default behavior is the following configuration, lightning will import according to the corresponding fields of the columns, even if the order is different, it can still be imported.
[mydumper.csv]
header = true

| username: Kongdom | Original post link

:thinking: I didn’t see this configuration?

| username: FutureDB | Original post link

Correct, you can add specific field names to the beginning of the data file, and the import will be done according to the field names in the file header. Our processing method is handled through a script:

  1. First, retrieve all the field names of the table, then remove the field names that do not need to be imported, and finally form the remaining field names into a header in order (note to add separators between field names);
  2. Add the header as a line to the file header of the data file, and finally, the import can be done according to the required fields.
| username: 路在何chu | Original post link

There is no such feature, right? Why would you want to skip a column?

| username: 烂番薯0 | Original post link

DM is fine.

| username: 哈喽沃德 | Original post link

Not supported

| username: 江湖故人 | Original post link

The original poster is migrating from Oracle to TiDB.

| username: 连连看db | Original post link

No, if the exported data format is CSV, you can process it with a script yourself, which might meet your requirements.

| username: zhanggame1 | Original post link

Migrating data from Oracle database to TiDB at the tens of millions level is not considered a lot. You can consider using Navicat for data transfer, which can handle tens of millions of records per hour without any issues.

| username: Kongdom | Original post link

For data transfer with Navicat, it’s fine for small volumes, but it’s too slow for large volumes. Using the ETL tool Kettle is faster, mainly because it can enable multithreading.

| username: zhanggame1 | Original post link

It’s not very fast. You can open multiple tables, and processing tens of thousands of records per second is still possible. The speed is acceptable for tens of millions of records.

| username: Kongdom | Original post link

:flushed: Is it really that fast? It doesn’t feel that fast in actual use. It might also be because my hardware configuration is relatively low.