lightningErrChecksumMismatch: Checksum Mismatched Remote vs Local

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: lightningErrChecksumMismatch checksum mismatched remote vs local

| username: 胡杨树旁

Cluster version 6.1.5, lightning incremental import of over 500G of data, the target table is not empty, lightning configuration file

[lightning]
# Log
level = "info"
file = "/home/tidb/tidb-lightning.log"
max-size = 128 # MB
max-days = 28
max-backups = 14
index-concurrency = 8
table-concurrency = 16
io-concurrency = 20
pprof-port = 8289

[tikv-importer]
# Choose the local backend
backend = "local"
# Set the temporary storage address for sorted key-value pairs, the target path needs to be an empty directory
sorted-kv-dir = "/tidb-data4/sort_ing03"
# Whether to enable compression when sending KV in physical import mode to TiKV
#compress-kv-pairs = "gz"
incremental-import = true
on-duplicate = "replace"

[checkpoint]
# Whether to enable breakpoint resume.
# When importing data, TiDB Lightning records the progress of the current table import.
# So even if TiDB Lightning or other components exit abnormally, it can avoid re-importing completed data upon restart.
enable = true
# The database name for storing breakpoints.
schema = "tidb_lightning_checkpoint"
# The method for storing breakpoints.
#  - file: Store in the local file system.
#  - mysql: Store in a MySQL-compatible database server.
driver = "file"
dsn = "/home/tidb/tidb_lightning_checkpoint.pb"

[mydumper]
# Source data directory.
data-source-dir = "/tidb-data5/con"
#incremental-import = true

# Configure wildcard rules, the default rule filters all tables under the mysql, sys, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA, METRICS_SCHEMA, INSPECTION_SCHEMA system databases
# If this item is not configured, an "unable to find schema" exception will occur when importing system tables
[tidb]
# Information of the target cluster
host = ""
port = 
user = ""
password = ""
# Table schema information is obtained from the "status port" of TiDB.
status-port = 10080
# Address of the cluster pd
pd-addr = ""

#tidb-lightning references the TiDB library and generates some logs.
# Set the log level of the TiDB library.
log-level = "error"

# Set TiDB session variables to improve the speed of Checksum and Analyze.
# Refer to the "Control Analyze Concurrency" document for the definition of each parameter
build-stats-concurrency = 20
distsql-scan-concurrency = 100
index-serial-scan-concurrency = 20
checksum-table-concurrency = 16

# The default SQL mode for parsing and executing SQL statements.
sql-mode = "ONLY_FULL_GROUP_BY,NO_ENGINE_SUBSTITUTION"
# `max-allowed-packet` sets the maximum packet size allowed for database connections,
# corresponding to the `max_allowed_packet` system parameter. If set to 0,
# the global level `max_allowed_packet` of the downstream database will be used.
max-allowed-packet = 67_108_864

Table creation statement:

CREATE TABLE `face_predict_record` (
  `id` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `orderId` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL COMMENT 'Insurance order ID',
  `type` varchar(32) DEFAULT NULL COMMENT '0: Success, not a re-photograph or ID photo; 1: Re-photograph; 2: ID photo face; 3: Both re-photograph and ID photo face; 4: Recognition failed, input field name or content error',
  `createBy` varchar(320) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `createDate` datetime DEFAULT NULL,
  `updateBy` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `updateDate` datetime DEFAULT NULL,
  `remarks` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `status` varchar(1) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '0' COMMENT '0: Enabled; 1: Deleted',
  `enterParam` longtext DEFAULT NULL COMMENT 'Request input parameters',
  `returnParam` text DEFAULT NULL COMMENT 'Return parameters',
  `systemSource` varchar(10) DEFAULT NULL COMMENT 'System source',
  `requestType` varchar(1) DEFAULT NULL COMMENT 'Request type (1: Anti-rephotograph area determination; 2: Anti-rephotograph probability determination; 3: Public security face recognition)',
  `responseCode` varchar(10) DEFAULT NULL COMMENT 'Response result (200: Anti-rephotograph recognition success; 999: Anti-rephotograph recognition failure; SUCCESS: Public security face success; FAIL: Public security face failure)',
  `responseMsg` varchar(10) DEFAULT NULL COMMENT 'Response error code (service return)',
  `showCode` varchar(10) DEFAULT NULL COMMENT 'Display error code (frontend page)',
  `showMsg` varchar(100) DEFAULT NULL COMMENT 'Display prompt message (frontend page)',
  `countFlag` varchar(1) DEFAULT NULL COMMENT 'Whether to count (0: No; 1: Yes)',
  `facePic` text DEFAULT NULL COMMENT 'Face recognition photo',
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_insurance_orderId` (`orderId`),
  KEY `idx_createDate` (`createDate`),
  KEY `idx_orderId` (`orderId`),
  KEY `idx_requestType` (`requestType`),
  KEY `idx_responseCode` (`responseCode`),
  KEY `idx_responseMsg` (`responseMsg`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

Checked count(distinct ID) = count(*)
Checked the lower_case_table_names parameter of the upstream and downstream clusters, the value of the export cluster is 1, and the value of the import cluster is 2.

| username: WalterWj | Original post link

In this case, you need to disable checksum, and it is not recommended to use local mode for import. Check the official documentation for more details.

| username: 胡杨树旁 | Original post link

Which parameter controls disabling the checksum? I checked the exported and imported data and found some discrepancies.

| username: 小龙虾爱大龙虾 | Original post link

I remember it will automatically check if the target table is empty. Won’t you get an error?

| username: 胡杨树旁 | Original post link

No errors were reported during the initial import.

| username: dba远航 | Original post link

Take a good look at the 303 training video.

| username: tidb菜鸟一只 | Original post link

–checksum=off