How to Resolve and Troubleshoot Missing Data During TiDB-Lightning Import

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-lightning导入漏数如何解决与排查

| username: TiDBer_zfadduuq

【TiDB Usage Environment】Production\Testing Environment\POC
【TiDB Version】5.3
【Encountered Problem】
Using tidb-lightning versions 5.0 and 5.4 to import CSV files results in fewer rows than counted with the wc -l command. What should I do?
【Reproduction Path】
【Problem Phenomenon and Impact】

| username: TiDBer_zfadduuq | Original post link

The magnitude of the tables is mostly in the tens of millions, with most of them having only a few dozen to a few hundred rows.

| username: TammyLi | Original post link

First, check the lightning log to confirm whether the import was completed normally (from your description, it seems that some data is missing, which indicates there might be an issue). Can you provide the complete lightning log? Also, please provide the start and end times of the problematic import.

| username: TiDBer_zfadduuq | Original post link

I imported many tables, one at a time, but each time there were missing records.

| username: TiDBer_zfadduuq | Original post link

The complete log is on the client’s server, and currently, there’s no way to retrieve the file from there, so we can only take screenshots.

| username: buchuitoudegou | Original post link

You need to check the log to know. Right now, we can only speculate. For example, if the source data being imported contains duplicate primary keys, it will cause the data in the table to decrease (because the primary key is unique). This situation will be identified and reported as an error through checksum comparison before the import is completed.

| username: TiDBer_zfadduuq | Original post link

There is this warning. The problem is that I exported it from the source database SQL Server. The source database has a primary key, and the target database also has one. There shouldn’t be any duplicate primary keys.

| username: buchuitoudegou | Original post link

The message “duplicate key found” indicates that there is a duplicate key. Could you please share the table schema file? Also, any task configurations would be helpful.

I’m not very familiar with SQL Server, but perhaps you could try importing using the TiDB-backend? Alternatively, you can enable the duplicate-resolution = “record” feature in local-backend for Lightning, re-import, and then check what the duplicate key is.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.