How to Configure Lighting to Ignore Duplicate Data During Import

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: lighting导入有重复数据 如何配置忽略导入

| username: tidb狂热爱好者

level = “info”
file = “1tidb-lightning.log”
max-error = 9223372036854775807

Choose the import mode to use

backend = “tidb”

Set the temporary storage location for sorted key-value pairs, the target path needs to be an empty directory

#sorted-kv-dir = “/data”
#strategy = “replace”
#threshold = 9223372036854775807

Table schema file

pattern = ‘(?i)^(?:[^/]/)tade..order_ops_his_v2_..[0-9].csv’
schema = “old_system_data”
table = “order_ops_his_v2_0”
type = “csv”
#pattern = '(?i)^(?:[^/]
#schema = “old_system_data”
table = “trade_settlement_his_v2_0”
#type = “csv”


Source data directory.

data-source-dir = “/dataa/”

Configure wildcard rules, the default rules will filter out all tables under the mysql, sys, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA, METRICS_SCHEMA, INSPECTION_SCHEMA system databases

If this item is not configured, an “unable to find schema” exception will occur when importing system tables


Information of the target cluster

host = “”
port = 4000
user = “root”
password = “”

Table schema information is obtained from the “status port” of TiDB.

#status-port = 10080

Address of the cluster pd

#pd-addr = “”

| username: 像风一样的男子 | Original post link

There is a conflict in the configuration file. Refer to the documentation for configuration.

| username: tidb狂热爱好者 | Original post link

I added it to the configuration file, but after a while, it said there was a conflict and exited.

| username: 像风一样的男子 | Original post link

strategy = “ignore”?