In physical import mode, when a conflict occurs in tidb-lightning, it will only deduplicate the data in the main table and will not delete the corresponding index values

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-ligntning在物理导入模式下发生冲突时,只会对主表的数据做去重,并不会删除index对应的值

| username: dba-kit

As mentioned, although the conflict strategy has been set to remove, if there is a conflict in the original file, it will cause the main table data to be normal, but an error will be reported when using ADMIN CHECK TABLE.

Normally, when deleting a record from the main table, the corresponding values in other index trees should also be deleted based on that record.

PS: An error was also reported when trying to clean up the index:

mysql> admin check table t1;
ERROR 8223 (HY000): data inconsistency in table: t1, index: idx_1, handle: {SA0CFCSR03GR3PS, 002420, 1844474736059351040}, index-values:"handle: {SA0CFCSR03GR3PS, 002420, 1844474736059351040}, values: [KindMysqlTime 2016-03-18 KindString SA0CFCSR03GR3PS KindString 002420 KindString SA0CFCSR03GR3PS KindString 002420 KindMysqlTime 2016-03-18]" != record-values:""
mysql> ADMIN RECOVER INDEX t1 idx_1;
ERROR 1105 (HY000): [components/tidb_query_executors/src/table_scan_executor.rs:422]: Data is corrupted, missing data for NOT NULL column (offset = 0)
mysql>
| username: dba-kit | Original post link

I don’t know if the conflict strategy in the new version 8.0 has solved this problem.

| username: Billmay表妹 | Original post link

@ShawnYan Try it out in version 8.0~

| username: dba-kit | Original post link

Looking at the logs, there is also output for “resolve duplicate rows completed,” and the end marker is also [“tidb lightning exit”] [finished=true].

| username: seiya-annie | Original post link

Is it a single Lightning node import or multiple Lightning nodes importing in parallel?

| username: seiya-annie | Original post link

Could you please share the Lightning logs?

| username: dba-kit | Original post link

It was imported on a single node. However, the logs seem to contain S3 AK information, so it’s not convenient to provide all of them. Can I just provide the warning information after the import is completed?

| username: dba-kit | Original post link

Here is the log after the import was successful. The logs during the import phase were all normal; I also confirmed with the developer that the file he provided had issues with duplicate data.

| username: changpeng75 | Original post link

Is it feasible to delete all indexes before importing and rebuild the indexes after importing?

| username: xiaoqiao | Original post link

Learn it.

| username: dba-kit | Original post link

As long as there are no duplicates in the original data, this issue should not occur. If this problem does arise, you can try using ADMIN RECOVER INDEX t1 idx_1; to repair the index. My scenario is quite special; the dirty data that appeared happened to fill a not null field with a null value, so the repair failed, and I had to delete and rebuild all indexes.

| username: tidb狂热爱好者 | Original post link

Got it. Let me check if version 8.0 has this issue.