Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 数据库产生主键id相同的重复数据
[Test Environment for TiDB]
[TiDB Version] V6.1.1
[Reproduction Path] After 2 million frequent updates on a single table, data duplication issue occurs
[Encountered Problem: Phenomenon and Impact]
After frequent updates on a single table in the database, data with the same primary key appears, but the update time fields are different.
Can you show the table structure?
DROP TABLE IF EXISTS `TAB_ORG`;
CREATE TABLE `TAB_ORG` (
`S_ID` varchar(100) NOT NULL COMMENT 'Organization ID',
`S_NAME` varchar(100) NOT NULL COMMENT 'Organization Name',
`S_PARENT_ID` varchar(100) NOT NULL COMMENT 'Parent Organization ID',
`S_CODE` varchar(100) DEFAULT '' COMMENT 'Code',
`I_TYPE` int(11) DEFAULT NULL COMMENT 'Organization Type ID',
`I_ORDER` int(11) NOT NULL COMMENT 'Order Number',
`S_PATH` varchar(100) NOT NULL COMMENT 'Organization Path',
`S_ICON` varchar(100) DEFAULT '' COMMENT 'Icon',
`DT_CREATE_TIME` datetime DEFAULT NULL COMMENT 'Creation Time',
`DT_UPDATE_TIME` datetime DEFAULT NULL COMMENT 'Update Time',
`S_CREATE_USER` varchar(100) DEFAULT '' COMMENT 'Created By',
`S_UPDATE_USER` varchar(100) DEFAULT '' COMMENT 'Updated By',
`I_DEL_FLG` int(11) NOT NULL DEFAULT 0 COMMENT 'Delete Flag',
`I_LEVEL` int(11) DEFAULT NULL COMMENT 'Organization Level',
`S_DESCRIPTION` varchar(200) DEFAULT '' COMMENT 'Description',
`S_EXTEND_INFO` longtext DEFAULT NULL COMMENT 'Extended Information',
`S_DOMAIN_TYPE` varchar(100) DEFAULT '' COMMENT 'Domain Type, 0 or empty for current level, subdomain',
PRIMARY KEY (`S_ID`,`I_DEL_FLG`) /*T![clustered_index] NONCLUSTERED */,
KEY `INDEX_TAB_ORG_S_PARENT_ID` (`S_PARENT_ID`),
KEY `INDEX_TAB_ORG_S_PATH` (`S_PATH`),
KEY `INDEX_TAB_ORG_I_TYPE` (`I_TYPE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='Organization Structure Table'
PARTITION BY HASH (`I_DEL_FLG`) PARTITIONS 2;
Is the i_del_flg
column also duplicated?
It is also duplicated. The two pieces of data are identical except for DT_CREATE_TIME
and DT_UPDATE_TIME
.
ADMIN CHECK table TAB_ORG;
Execute and see.
Execution result error:
SQL Error [8223] [HY000]: data inconsistency in table: TAB_ORG, index: PRIMARY, handle: 4, index-values: “handle: 30001, values: [KindString 510000000000-Structure KindInt64 0]” != record-values: “handle: 4, values: [KindString 510000000000-Structure KindInt64 0]”
The bug of data index inconsistency is generally very difficult to handle…
Thank you for your answer. May I ask if there is a solution? What generally causes this situation?
The data volume is not large, create a new table and import the data into the new table.
This is indeed a bug. If it’s an index inconsistency, you can still fix the index, but the data in your table is inconsistent, making it unmanageable. Unless you can determine from the business logic which data is abnormal and remove it directly, but even if you remove it now, the same issue might still occur later. You might really need to contact TiDB developers to take a look at this.
You can submit an issue now, right?
Try this. However, according to the documentation, this is used to recover index data from the table. If the records in the table correspond to a unique index/primary key, it will be more difficult to handle. It is unclear whether the delete statement can run normally on the table. Moreover, if the primary key of a clustered index table has issues, it should also be impossible to drop and rebuild it.
In that case, it must be a bug.
Index data inconsistency is generally caused by Lightning local mode not exiting correctly or using the tispark write-back feature, which did not complete the expected write.