Specifies the character set of the source data file. Lightning will convert the source file from the specified character set to UTF-8 encoding during the import process.
This configuration item is currently only used to specify the character set of CSV files. The following options are supported:
- utf8mb4: The source data file uses UTF-8 encoding.
- GB18030: The source data file uses GB-18030 encoding.
- GBK: The source data file uses GBK encoding (GBK encoding is an extension of the GB-2312 character set, also known as Code Page 936).
- binary: Do not attempt to convert encoding (default).
Leaving this configuration empty will default to “binary”, meaning no attempt to convert encoding.
It is important to note that Lightning does not make assumptions about the character set of the source data file and will only transcode and import data based on this configuration.
If the character set setting does not match the actual encoding of the source data file, it may result in import failure, missing data, or garbled data.