Checksum mismatch in SQL logs during Lightning import

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: lightning导入sql日志checksum不一致

| username: Ming

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.4.1
【Encountered Problem】Checksum error displayed in the lightning import SQL file log
【Reproduction Path】

  1. Export data using dumpling v5.4.0 in SQL mode on version 3.0.8
  2. Use lightning to specify the dumpling directory and import the SQL file into the new v5.4.1 cluster (lightning v5.4.0)
    【Problem Phenomenon and Impact】
    The downstream does not have this table
    Checksum results are inconsistent
    Log:
    error=“checksum: mismatched remote vs local => (checksum: 4601569717304146001 vs 3003714860105671480) (total_kvs: 6442057882 vs 6442057914) (total_bytes: 659445170496 vs 659445174266)”
| username: tidb狂热爱好者 | Original post link

You use mysqldump导

| username: Ming | Original post link

I executed two imports that night, and the first one had no issues.

| username: buchuitoudegou | Original post link

When executing the import, what does it mean that the first one has no issues? checksum mismatched could be caused by some data conflicts (such as duplicate keys).

You might want to refer to this document: TiDB Lightning 故障处理 | PingCAP 文档中心

| username: Ming | Original post link

There are indeed duplicate keys. How should we specifically troubleshoot the cause of these duplicates? Also, if there are duplicates, is the actual data unaffected and only the checksum encounters an error?

| username: buchuitoudegou | Original post link

It is possible that your original data contains duplicates, or there were already existing data in the table before the import (incremental import mode).

You might want to check out the error handling feature of Lightning, which can help you locate the erroneous rows and skip some errors: TiDB Lightning 错误处理功能 | PingCAP 文档中心

| username: buchuitoudegou | Original post link

If there is an issue with the checksum, it indicates either a problem during the import process (such as network issues causing some key values not to be transmitted) or the imported data violates the schema’s integrity (for example, having duplicate values in a unique key index). Such data should be considered unusable for production.

| username: tidb狂热爱好者 | Original post link

If there is an issue with the checksum, it indicates that the data import is incorrect, similar to an MD5 check.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.