When encountering this situation during TiDB Lightning import and merge, is there duplicate data?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb light导入 合并的时候遇到这个情况是不是有重复数据

| username: tidb狂热爱好者

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

7.6G 8fa24e7f-f3b3-56f9-b915-fc7af46a8607
0 8fa24e7f-f3b3-56f9-b915-fc7af46a8607.sst
3.6G duplicates
2.7G f91c78b5-f378-51c0-bb21-5390a0cc9567
0 f91c78b5-f378-51c0-bb21-5390a0cc9567.sst
| username: yulei7633 | Original post link

The prompt indicates that there is duplicate data.

| username: 像风一样的男子 | Original post link

How do you set the duplicate-resolution parameter?

| username: 小于同学 | Original post link

How are the parameters set?

| username: DBAER | Original post link

Take a look at duplicate-resolution.

| username: dba远航 | Original post link

This is probably caused by repeated operations on the same table or multiple tables.

| username: 小龙虾爱大龙虾 | Original post link

Is this the directory you are referring to, the sort-dir specified by lightning?

| username: TiDBer_aaO4sU46 | Original post link

Yes, it’s a clear indication.

| username: TiDBer_q9aZZ7Vr | Original post link

You can check the conflict_error_v1 table under the lightning_task_info database for confirmation.

| username: FutureDB | Original post link

Which mode did you use to import with Lightning, logical mode or physical mode? Can you check what your conflict handling strategy is? Specifically, what is the value of strategy under [conflict]?

| username: zhang_2023 | Original post link

Check the parameter configuration file.