How to Configure Lighting to Ignore Duplicate Data During Import

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: lighting导入有重复数据 如何配置忽略导入

| username: tidb狂热爱好者

[TiDB Usage Environment] Production Environment / Test / Poc
[TiDB Version]
[Reproduction Path] What operations were performed that caused the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

[lightning]

Log

level = “info”
file = “1tidb-lightning.log”
max-error = 9223372036854775807
[tikv-importer]

Choose the import mode to use

backend = “tidb”

Set the temporary storage location for sorted key-value pairs, the target path needs to be an empty directory

#sorted-kv-dir = “/data”
[conflict]
#strategy = “replace”
#threshold = 9223372036854775807
[[mydumper.files]]

Table schema file

pattern = ‘(?i)^(?:[^/]/)tade..order_ops_his_v2_..[0-9].csv’
schema = “old_system_data”
table = “order_ops_his_v2_0”
type = “csv”
#pattern = '(?i)^(?:[^/]
/)trade..trade_settlement_his_v2_..[0-0].csv’
#schema = “old_system_data”
table = “trade_settlement_his_v2_0”
#type = “csv”

[mydumper]

Source data directory.

data-source-dir = “/dataa/”

Configure wildcard rules, the default rules will filter out all tables under the mysql, sys, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA, METRICS_SCHEMA, INSPECTION_SCHEMA system databases

If this item is not configured, an “unable to find schema” exception will occur when importing system tables

filter = [‘.’, ‘!mysql.', '!sys.’, ‘!INFORMATION_SCHEMA.', '!PERFORMANCE_SCHEMA.’, ‘!METRICS_SCHEMA.', '!INSPECTION_SCHEMA.’]
[tidb]

Information of the target cluster

host = “1.1.1.1”
port = 4000
user = “root”
password = “”

Table schema information is obtained from the “status port” of TiDB.

#status-port = 10080

Address of the cluster pd

#pd-addr = “10.12.8.89:2379”

| username: 像风一样的男子 | Original post link

There is a conflict in the configuration file. Refer to the documentation for configuration.

| username: tidb狂热爱好者 | Original post link

I added it to the configuration file, but after a while, it said there was a conflict and exited.

| username: 像风一样的男子 | Original post link

strategy = “ignore”?