Primary Key Issues

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 主键问题

| username: Chen1234

【TiDB Usage Environment】Production Environment
【TiDB Version】5.4.2
【Process】The isdelete field in the queueok table is of type int and has only three values (0, 1, null). Directly counting the table data gives 813,602,244. Querying by isdelete value gives results of 654,550,915, 332,686,701, and 96,173 respectively. The sum of these three values exceeds the total count by over 100 million.
【Solution】Re-dumping and importing the table resolved the issue. The count of the data is 813,602,244, and the sum of the three values queried by isdelete matches the total count.
【Issue Encountered】
The queueid in the queueok table is the primary key. During import, a primary key conflict was reported. Grepping the SQL file by queueid revealed duplicate primary key values. The question is, why does the database return only one record when querying by primary key, but the dumpling file contains two records with the same primary key ID?

| username: tidb菜鸟一只 | Original post link

You are experiencing an issue with table and index inconsistency, which is usually triggered by a bug.

| username: zhanggame1 | Original post link

Is this table a clustered index or a non-clustered index?

| username: Chen1234 | Original post link

The table is too large, we can’t always solve it by dumping. Is there any way to avoid this?

| username: Chen1234 | Original post link

Non-clustered index

| username: tidb狂热爱好者 | Original post link

Is this an issue with table and index inconsistency? Usually, such situations are triggered by a bug.
There’s still this bug?

| username: MrSylar | Original post link

When the primary key is varchar, dumpling should determine the split range based on _tidb_rowid instead of the primary key. So, does the original table itself have two duplicate primary key records?

| username: Chen1234 | Original post link

I checked all the data in the sharded databases on the source side, and there is only one record.

| username: zhanggame1 | Original post link

Why not switch to clustered indexes? They offer faster read and write speeds and eliminate the primary key index issue.

| username: WalterWj | Original post link

Has this table been imported using Lightning local mode or written back using the tispark write-back feature?

| username: Chen1234 | Original post link

This table is sharded in the MySQL source and synchronized to TiDB for aggregation through Otter. Lightning local requires three times the free space of the table, but this disk space is not sufficient, so data can only be imported using the mysql < **.sql method.

| username: zhanggame1 | Original post link

mysql < **.sql is for file import, right? For file import, the table types on both sides can be different. How about changing it to a clustered table?

| username: Chen1234 | Original post link

I’ll give it a try.

| username: Min_Chen | Original post link


This issue is generally caused by duplicate data on the primary key during the import process. After importing through Lightning’s non-SQL backend, it results in one primary key index key corresponding to two or more actual data entries. When you query, the actual count is based on the primary key index, while Dumpling exports by scanning the entire table and retrieves the actual data. This discrepancy leads to the observed phenomenon.

| username: Jellybean | Original post link

This issue is very strange.

  1. For the execution process above, can you post the explain analyze execution process? Let’s analyze it together.

  2. How was the data in this table in TiDB written? Was it synchronized through DM, or using the lightning tool, or directly inserted by the business?

| username: Chen1234 | Original post link

  1. This table’s source is a sharded MySQL database, synchronized to TiDB via Otter. During initialization, it was done through Lightning.
| username: TiDBer_iCdTOZ1r | Original post link

This should be different from MySQL, right?