Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb-lightning导入漏数如何解决与排查
【TiDB Usage Environment】Production\Testing Environment\POC
【TiDB Version】5.3
【Encountered Problem】
Using tidb-lightning versions 5.0 and 5.4 to import CSV files results in fewer rows than counted with the wc -l command. What should I do?
【Reproduction Path】
【Problem Phenomenon and Impact】
The magnitude of the tables is mostly in the tens of millions, with most of them having only a few dozen to a few hundred rows.
First, check the lightning log to confirm whether the import was completed normally (from your description, it seems that some data is missing, which indicates there might be an issue). Can you provide the complete lightning log? Also, please provide the start and end times of the problematic import.
I imported many tables, one at a time, but each time there were missing records.
The complete log is on the client’s server, and currently, there’s no way to retrieve the file from there, so we can only take screenshots.
You need to check the log to know. Right now, we can only speculate. For example, if the source data being imported contains duplicate primary keys, it will cause the data in the table to decrease (because the primary key is unique). This situation will be identified and reported as an error through checksum comparison before the import is completed.
There is this warning. The problem is that I exported it from the source database SQL Server. The source database has a primary key, and the target database also has one. There shouldn’t be any duplicate primary keys.
The message “duplicate key found” indicates that there is a duplicate key. Could you please share the table schema file? Also, any task configurations would be helpful.
I’m not very familiar with SQL Server, but perhaps you could try importing using the TiDB-backend? Alternatively, you can enable the duplicate-resolution = “record” feature in local-backend for Lightning, re-import, and then check what the duplicate key is.
This topic was automatically closed 1 minute after the last reply. No new replies are allowed.