Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: lightning导入CSV时,无法正确区分处理空字符串和空值(""和NULL)
[TiDB Usage Environment] Production Environment / Testing / Poc
Data Migration Import
[TiDB Version]
v7.1.0, v7.1.1, and v7.1.2
[Reproduction Path] Operations performed to reproduce the issue
Follow the standard process to import the CSV file, as shown in the CSV and Lightning configuration files below:
light.toml (429 bytes)
lightningtest.NTEST.csv (60 bytes)
[Encountered Issue: Problem Description and Impact]
In the CSV file, a comma is used as the field separator, and double quotes are used as the delimiter.
The table structure for testing is as follows:
create table NTEST(id bigint primary key, name varchar(32), remark varchar(128));
The expected result is that the CSV originates from another non-MYSQL database, where the same column contains both empty strings (i.e., “”) and NULL values. After import, “” should be treated as an empty string in TiDB, and NULL values should be treated as NULL.
The actual result is that both empty strings and NULL values are treated as NULL values and stored in the TiDB database, which does not match the expectation.

[Resource Configuration]
Not applicable
[Attachments: Screenshots/Logs/Monitoring]
In TiDB, null and ‘’ are not distinguished; both are considered null.
The Lightning tool can distinguish between null values and empty strings. You can try these two parameters:
Whether the CSV file contains NULL.
If not-null = true, no columns in the CSV can be parsed as NULL.
not-null = false
If not-null = false (i.e., the CSV can contain NULL),
fields with the following value will be parsed as NULL.
null = ‘\N’
Whether to escape "" within fields
This is definitely unreasonable. In the end, we also used the method of replacing and then updating to write ‘’ into TiDB.
In the scenario I simulated, in the lightning configuration file:
Regarding the not-null parameter:
not-null is set to false because the CSV actually comes from another non-MySQL ecosystem database, and the NULL values need to be migrated to TiDB.
Regarding the null=‘\N’ parameter:
During our testing, since the source database exports NULL values to CSV as two consecutive delimiters, i.e., two commas ,,
, if null=‘\N’, these NULL values cannot be processed and imported into TiDB. Therefore, null needs to be set to ‘’.
In summary, this import is limited by the CSV exported from the source database, where the exported CSV can clearly distinguish between empty strings and NULL values (,“”, and ,). We hope that lightning can also distinguish and process these values into TiDB according to this logic, but we have not found a feasible configuration solution.
Due to the project’s time constraints, we bypassed this issue by using sed for batch replacement and then performing a batch update after importing into TiDB.
We have encountered the same issue.
It is indeed a functional deficiency. If the data volume is large, manual replacement becomes very troublesome.
You can post an issue in the question section. If there are indeed optimization points in terms of functionality, it should be optimized in the future.
It is necessary to recognize empty strings and null values according to the settings of Lightning.
I’ve encountered this issue before. If the quantity is small, it’s generally better to fill the null values with a special string at the source and then process it after importing.
I’ve encountered this before; you can only handle it manually for now.
Lightning indeed handles “”, and , in the same way 
This is quite troublesome to handle, it seems it can only be solved manually 
I would like to ask if there are any plans to solve this problem in this scenario in the future, or how to submit a request.
It seems that solving it at the code technical level shouldn’t be too complicated.
There is a solution for MySQL data sources; you can use Dumpling to specify the exported null values during the export process. Dumpling Overview | PingCAP Docs
For non-MySQL data sources, there doesn’t seem to be a plan at the moment. You can try submitting an issue at Sign in to GitHub · GitHub.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.