Lighting import error checksum mismatched remote vs local

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: lighting导入报错checksum mismatched remote vs local

| username: 扬仔_tidb

【TiDB Usage Environment】Production Environment
【TiDB Version】
Source tidb:v5.3.0, New Cluster v:6.5.5
Export Command: tiup dumpling:v5.3.0 No errors, normal
Import Command: tidb-lightning:v6.5.5 Error in the final table checksum

【Encountered Problem: Phenomenon and Impact】

  1. Error
    [2023/11/08 08:24:57.403 +08:00] [ERROR] [restore.go:1562] [“restore all tables data failed”] [takeTime=33h35m18.140456268s] [error=“[Lighting:Restore:ErrChecksumMismatch]checksum mismatched remote vs local => (checksum: 5082494353818434127 vs 4870243886886057165) (total_kvs: 1053425055 vs 1053425056) (total_bytes:182934745248 vs 182934745297)”]

[2023/11/08 08:24:57.403 +08:00] [ERROR] [restore.go:495] [“run failed”] [step=4] [error=“[Lighting:Restore:ErrChecksumMismatch]checksum mismatched remote vs local => (checksum: 5082494353818434127 vs 4870243886886057165) (total_kvs: 1053425055 vs 1053425056) (total_bytes:182934745248 vs 182934745297)”]

[2023/11/08 08:24:57.403 +08:00] [INFO] [restore.go:1200] [“everything imported, stopping periodic actions”]

[2023/11/08 08:24:57.403 +08:00] [ERROR] [restore.go:505] [“the whole procedure failed”] [takeTime=33h35m43.982662786s] [error=“[Lighting:Restore:ErrChecksumMismatch]checksum mismatched remote vs local => (checksum: 5082494353818434127 vs 4870243886886057165) (total_kvs: 1053425055 vs 1053425056) (total_bytes:182934745248 vs 182934745297)”]

[2023/11/08 08:24:57.403 +08:00] [ERROR] [restore.go:173] [“tables failed to be imported”] [count=1]

[2023/11/08 08:24:57.403 +08:00] [ERROR] [restore.go:175] [-] [table=scm.wxhc_logistics_order_detail] [status=checksum] [error=“[Lighting:Restore:ErrChecksumMismatch]checksum mismatched remote vs local => (checksum: 5082494353818434127 vs 4870243886886057165) (total_kvs: 1053425055 vs 1053425056) (total_bytes:182934745248 vs 182934745297)”]

[2023/11/08 08:24:57.419 +08:00] [INFO] [checksum.go:480] [“service safe point keeper exited”]

[2023/11/08 08:24:57.419 +08:00] [ERROR] [main.go:103] [“tidb lightning encountered error stack info”] [error=“[Lighting:Restore:ErrChecksumMismatch]checksum mismatched remote vs local => (checksum: 5082494353818434127 vs 4870243886886057165) (total_kvs: 1053425055 vs 1053425056) (total_bytes:182934745248 vs 182934745297)”] [errorVerbose="[Lighting:Restore:ErrChecksumMismatch]checksum mismatched remote vs local => (checksum: 5082494353818434127 vs 4870243886886057165) (total_kvs: 1053425055 vs 1053425056) (total_bytes:182934745248 vs 182934745297)\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66

  1. Checked the export SQL file of the wxhc_logistics_order_detail table, and I see the last few rows have already been imported into the new TiDB cluster.
  2. Should I re-import the wxhc_logistics_order_detail table? Is it because the tool versions are not matching during the import?
    【Resource Configuration】
    【Attachments: Screenshots/Logs/Monitoring】
| username: 扬仔_tidb | Original post link

Both the old and new clusters have the parameter lower_case_table_names set to 2.

| username: 有猫万事足 | Original post link

The checksum is triggered only during physical mode import. The checksum inconsistencies I encountered are generally due to the following reasons:

  1. Someone else wrote to the table during the import.
  2. For incremental imports, the SQL file from the previous import was not completely deleted due to various issues. This causes the data to be imported again during the next import, resulting in duplicate data for a particular day (this situation ignores index constraints in physical imports).

If you are certain that only the last table has issues, you might consider removing the SQL files and safepoints of the other tables and trying to import just the last table.

Additionally, when running Lightning again, you might need to use lightning-ctl to clear the safepoint due to its presence. Be very careful and read the command prompts clearly to avoid clearing the safepoints of other tables. If you accidentally clear them, immediately stop the GC of the target database and flashback the deleted tables. (I made this mistake once :sweat_smile:)

| username: 扬仔_tidb | Original post link

Thank you for your response.
My target cluster is a brand new empty database. The problem now is that an error occurs during the import process. I am not sure if there are other tables waiting to be imported that were interrupted by this error and thus did not continue importing the other tables?

| username: 有猫万事足 | Original post link

From the process perspective, the corresponding SST file of the data has actually reached the target database before performing this checksum. That is, all your data has been imported, but it is incorrect. For example, a unique index might correspond to 2 rows of data.

It can be said that there may be data issues in other tables that have not been checked, but the import has been fully completed. Therefore, the previous checksums that have passed can guarantee no errors, which is why I suggest you try importing the data of this table separately.

| username: 有猫万事足 | Original post link

It looks like someone secretly inserted a piece of data during the import. :sweat_smile:

| username: 扬仔_tidb | Original post link

How to install tidb-lightning-ctl?

| username: 有猫万事足 | Original post link

Under the ctl component.

| username: Fly-bird | Original post link

You can directly download tidb-lightning-ctl from the official website; it is a bundled package.

| username: 扬仔_tidb | Original post link

After performing the operation as per the official website example using tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all, the same error is still reported upon re-import.

| username: 扬仔_tidb | Original post link

Thanks to the experts above for their answers. The issue has been identified and resolved.

Cause: In version 6, the character set causes case insensitivity to be consistent with MySQL, but in version 5, it is case sensitive, meaning jD and JD can be inserted in version 5.

Solution: Disable the unique index for the import. After the import is complete, delete the conflicting rows, leaving only the uppercase ones. The business side has confirmed that there are no issues.

| username: 胡杨树旁 | Original post link

Incremental mode, local import, but the parameter on-duplicate = “replace” is set. An error occurred during the import: [2023/12/04 14:29:40.914 +08:00] [ERROR] [main.go:103] [“tidb lightning encountered error stack info”] [error=“[Lighting:Restore:ErrChecksumMismatch] checksum mismatched remote vs local => (checksum: 16703975120714088230 vs 12785062943811019446) (total_kvs: 57132710 vs 57309744) (total_bytes:1267649449255 vs 1267662372737)”] [errorVerbose=“[Lighting:Restore:ErrChecksumMismatch] checksum mismatched remote vs local => (checksum: 16703975120714088230 vs 12785062943811019446) (total_kvs: 57132710 vs 57309744) (total_bytes:1267649449255 vs 1267662372737)”]

| username: 有猫万事足 | Original post link

It does seem like there might be duplicate primary keys or unique keys in the inserted data. “Local” refers to your local data, and “Remote” refers to the data in your database. The database has fewer entries than your local data.

If you rule out the possibility that someone deleted data from the corresponding table during the import, then you should focus on checking for primary key or unique key conflicts.

| username: 胡杨树旁 | Original post link

There is a conflict between the primary key and the unique key. How should this be investigated? The target database is a historical database, and this table only has one primary key ID. I checked the total data volume of distinct IDs and the table’s count(*), and they are the same. Moreover, the source data has already been cleaned up, so there is no way to query the source data.

| username: 有猫万事足 | Original post link

The safest way is, of course, to write a script to read from the imported SQL/CSV and compare it with the target database. However, for SST, there is indeed no good solution.

If you are confident with the first two methods, you might as well use logical import directly. Logical import won’t have subsequent checksums. Although physical import is fast, after going through the checksum and analyze checks, it’s hard to say if it’s much faster than logical import in the case of incremental imports.

Because with incremental imports in physical mode, the checksum will check the checksum value of the entire table. Historical data will also be calculated once.

| username: 胡杨树旁 | Original post link

How should this script be written? I have no idea at all…

| username: heiwandou | Original post link

Write a script to test it.

| username: 有猫万事足 | Original post link

Although it might sound a bit unreliable, I still want to say that whether you have no idea at all or don’t know how to implement something, you can try asking ChatGPT first. :rofl:

If you describe it clearly, you can at least get a simple Python implementation, which might work with some tweaking. At most, it won’t handle multithreading and performance can’t be guaranteed. The logic is unlikely to be significantly wrong.

| username: 胡杨树旁 | Original post link

Good idea, how did I forget to ask ChatGPT?