Duplicate data with the same primary key ID generated in the database

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据库产生主键id相同的重复数据

| username: TiDBer_WXPSGmVF

[Test Environment for TiDB]
[TiDB Version] V6.1.1
[Reproduction Path] After 2 million frequent updates on a single table, data duplication issue occurs
[Encountered Problem: Phenomenon and Impact]
After frequent updates on a single table in the database, data with the same primary key appears, but the update time fields are different.

| username: tidb菜鸟一只 | Original post link

Can you show the table structure?

| username: TiDBer_WXPSGmVF | Original post link

DROP TABLE IF EXISTS `TAB_ORG`;
CREATE TABLE `TAB_ORG` (
  `S_ID` varchar(100) NOT NULL COMMENT 'Organization ID',
  `S_NAME` varchar(100) NOT NULL COMMENT 'Organization Name',
  `S_PARENT_ID` varchar(100) NOT NULL COMMENT 'Parent Organization ID',
  `S_CODE` varchar(100) DEFAULT '' COMMENT 'Code',
  `I_TYPE` int(11) DEFAULT NULL COMMENT 'Organization Type ID',
  `I_ORDER` int(11) NOT NULL COMMENT 'Order Number',
  `S_PATH` varchar(100) NOT NULL COMMENT 'Organization Path',
  `S_ICON` varchar(100) DEFAULT '' COMMENT 'Icon',
  `DT_CREATE_TIME` datetime DEFAULT NULL COMMENT 'Creation Time',
  `DT_UPDATE_TIME` datetime DEFAULT NULL COMMENT 'Update Time',
  `S_CREATE_USER` varchar(100) DEFAULT '' COMMENT 'Created By',
  `S_UPDATE_USER` varchar(100) DEFAULT '' COMMENT 'Updated By',
  `I_DEL_FLG` int(11) NOT NULL DEFAULT 0 COMMENT 'Delete Flag',
  `I_LEVEL` int(11) DEFAULT NULL COMMENT 'Organization Level',
  `S_DESCRIPTION` varchar(200) DEFAULT '' COMMENT 'Description',
  `S_EXTEND_INFO` longtext DEFAULT NULL COMMENT 'Extended Information',
  `S_DOMAIN_TYPE` varchar(100) DEFAULT '' COMMENT 'Domain Type, 0 or empty for current level, subdomain',
  PRIMARY KEY (`S_ID`,`I_DEL_FLG`) /*T![clustered_index] NONCLUSTERED */,
  KEY `INDEX_TAB_ORG_S_PARENT_ID` (`S_PARENT_ID`),
  KEY `INDEX_TAB_ORG_S_PATH` (`S_PATH`),
  KEY `INDEX_TAB_ORG_I_TYPE` (`I_TYPE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='Organization Structure Table'
PARTITION BY HASH (`I_DEL_FLG`) PARTITIONS 2;
| username: tidb菜鸟一只 | Original post link

Is the i_del_flg column also duplicated?

| username: TiDBer_WXPSGmVF | Original post link

It is also duplicated. The two pieces of data are identical except for DT_CREATE_TIME and DT_UPDATE_TIME.

| username: tidb菜鸟一只 | Original post link

ADMIN CHECK table TAB_ORG;
Execute and see.

| username: TiDBer_WXPSGmVF | Original post link

Execution result error:

SQL Error [8223] [HY000]: data inconsistency in table: TAB_ORG, index: PRIMARY, handle: 4, index-values: “handle: 30001, values: [KindString 510000000000-Structure KindInt64 0]” != record-values: “handle: 4, values: [KindString 510000000000-Structure KindInt64 0]”

| username: tidb菜鸟一只 | Original post link

The bug of data index inconsistency is generally very difficult to handle…

| username: TiDBer_WXPSGmVF | Original post link

Thank you for your answer. May I ask if there is a solution? What generally causes this situation?

| username: zhanggame1 | Original post link

The data volume is not large, create a new table and import the data into the new table.

| username: tidb菜鸟一只 | Original post link

This is indeed a bug. If it’s an index inconsistency, you can still fix the index, but the data in your table is inconsistent, making it unmanageable. Unless you can determine from the business logic which data is abnormal and remove it directly, but even if you remove it now, the same issue might still occur later. You might really need to contact TiDB developers to take a look at this.

| username: cassblanca | Original post link

You can submit an issue now, right?

| username: 有猫万事足 | Original post link

Try this. However, according to the documentation, this is used to recover index data from the table. If the records in the table correspond to a unique index/primary key, it will be more difficult to handle. It is unclear whether the delete statement can run normally on the table. Moreover, if the primary key of a clustered index table has issues, it should also be impossible to drop and rebuild it.

| username: redgame | Original post link

In that case, it must be a bug.

| username: WalterWj | Original post link

Index data inconsistency is generally caused by Lightning local mode not exiting correctly or using the tispark write-back feature, which did not complete the expected write.