Chaotic Data Query in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb查询数据混乱

| username: TiDBer_AcAczQ2E

[TiDB Usage Environment] Test
[TiDB Version] 5.3
[Encountered Problem]
The composite primary key of this table is item_id and erpsource. When querying by delivery_id, two records with delivery_id=4025931 are returned, and these two records have the same composite primary key values!

SELECT * from erp_sdb_wms_delivery_items where delivery_id=4025931

When querying by item_id, only one record with delivery_id=4025931 is returned, and no duplicate composite primary key records appear.

SELECT * from erp_sdb_wms_delivery_items where item_id=7981775


[Problem Phenomenon and Impact]
Table structure:

CREATE TABLE `erp_sdb_wms_delivery_items`  (
  `item_id` bigint(20) NOT NULL,
  `order_item_id` bigint(20) NULL DEFAULT NULL,
  `delivery_id` bigint(20) NULL DEFAULT NULL,
  `product_id` bigint(20) NULL DEFAULT NULL,
  `shop_product_id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `bn` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `product_name` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `number` bigint(20) NULL DEFAULT NULL,
  `pick_number` bigint(20) NULL DEFAULT NULL,
  `addon` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `sell_code` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `iscustomitem` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `unique_code_status` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `unique_code_status_lasttime` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `unique_code_status_op_id` bigint(20) NULL DEFAULT NULL,
  `is_freeze` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `freeze_time` bigint(20) NULL DEFAULT NULL,
  `unique_code` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `picurl` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `photo` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `photo_design` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `photo_ai` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `photo_append` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `word_photo` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `word_append` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `letter_photo` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `letter_append` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `verify` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `verify_num` bigint(20) NULL DEFAULT NULL,
  `price` double NULL DEFAULT NULL,
  `sale_price` double NULL DEFAULT NULL,
  `pmt_price` double NULL DEFAULT NULL,
  `custom_type` int(11) NULL DEFAULT NULL,
  `ispickup` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `checknum` bigint(20) NULL DEFAULT NULL,
  `photoimgispull` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `wordphotoimgispull` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `letterphotoimgispull` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `glasses_id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `glasses_sendtime` bigint(20) NULL DEFAULT NULL,
  `defective_goods` int(11) NULL DEFAULT NULL,
  `defective_memo` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `defective_supplier` bigint(20) NULL DEFAULT NULL,
  `defective_time` bigint(20) NULL DEFAULT NULL,
  `clarity_photo` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `check_clarity` bigint(20) NULL DEFAULT NULL,
  `clarity_time` bigint(20) NULL DEFAULT NULL,
  `clarity_type` bigint(20) NULL DEFAULT NULL,
  `is_urgent` int(11) NULL DEFAULT NULL,
  `urgent_opid` bigint(20) NULL DEFAULT NULL,
  `gspdf_status` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `gspdf_memo` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `gspdf_time` bigint(20) NULL DEFAULT NULL,
  `myjit_pull_status` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `myjit_pull_time` bigint(20) NULL DEFAULT NULL,
  `gspdf_order_sn` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `gspdf_invoice_no` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `zworders_status` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `zworders_memo` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
  `zworders_time` bigint(20) NULL DEFAULT NULL,
  `zworders_sn` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `zworders_no` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `is_only_product` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `clarity_intimes` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `erpsource` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
  PRIMARY KEY (`item_id`, `erpsource`) USING BTREE,
  INDEX `index_unique_code`(`unique_code`) USING BTREE,
  INDEX `index_order_item_id`(`order_item_id`) USING BTREE,
  INDEX `index_gspdf_status`(`gspdf_status`) USING BTREE,
  INDEX `ind_ido`(`ispickup`, `delivery_id`, `order_item_id`) USING BTREE,
  INDEX `ind_number_delivery`(`delivery_id`, `number`) USING BTREE,
  INDEX `ind_zworders_status`(`zworders_status`) USING BTREE,
  INDEX `ind_bn_status_code`(`bn`, `delivery_id`, `unique_code`) USING BTREE,
  INDEX `ind_unique_code_status`(`unique_code_status`) USING BTREE,
  INDEX `idx_deliveryid_customtype_isfreeze_itemid_productid`(`delivery_id`, `custom_type`, `is_freeze`, `item_id`, `product_id`) USING BTREE,
  INDEX `idx_deliveryid_uniquecode_productid`(`delivery_id`, `unique_code`, `product_id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = Compact;

[Attachment]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: TiDBer_AcAczQ2E | Original post link

The version of TiSpark used for data writing is 2.4.1.

| username: Billmay表妹 | Original post link

Is it possible to consider upgrading TiSpark to version 3.0 and give it a try?

| username: TiDBer_AcAczQ2E | Original post link

Not planning to upgrade, is there any solution?

| username: ShawnYan | Original post link

Just add “order by”. The result set of this query is not forcibly sorted.

| username: ShawnYan | Original post link

You can also refer to this question.

| username: TiDBer_AcAczQ2E | Original post link

We’re not talking about the same issue, are we? :sweat_smile:

| username: ealam_小羽 | Original post link

Is this data itself undergoing changes? If not, could you provide two examples of this data after desensitization for verification?

| username: ealam_小羽 | Original post link

4.0.16 simulation did not reproduce.

| username: TiDBer_AcAczQ2E | Original post link

Written with TiSpark, the data volume is over ten million rows.

| username: TiDBer_AcAczQ2E | Original post link

Executing the query SELECT * from erp_sdb_wms_delivery_items where item_id >7950000 and item_id <=7980000; results in an error: inconsistent index PRIMARY handle count 8192 isn't equal to value count 8191.

| username: TiDBer_AcAczQ2E | Original post link

What is the process you simulated? Here, I am using TiSpark to write code:

df.write.
        format("tidb").
        option("tidb.addr", "172.xxx").
        option("tidb.port", "4000").
        option("tidb.user", "bdata").
        option("tidb.password", "").
        option("database", "bdata").
        option("table", tblName).
        option("replace", true).
        option("isolationLevel", "NONE").
        option(JDBCOptions.JDBC_BATCH_INSERT_SIZE, 1000).
        mode("append").
        save()
| username: TiDBer_jYQINSnf | Original post link

This issue looks quite serious. Keeping an eye on it… Waiting for the official conclusion.

| username: TiDBer_AcAczQ2E | Original post link

Yes, now all the results of the SQL execution are wrong.

| username: TiDBer_jYQINSnf | Original post link

Are you sure there were no modifications to the data between the two queries? If so, please let me know once the official conclusion is available. Thanks :smile:

| username: TiDBer_AcAczQ2E | Original post link

I haven’t made any modifications. I suspect it’s an issue of inconsistency between the index and the data.

| username: 张雨齐0720 | Original post link

This issue exists in distributed systems, the result set instability problem.

| username: 数据小黑 | Original post link

It is recommended to follow the suggestions below to fix the issue and see if the aforementioned phenomenon can be reproduced after the fix.
The official data repair provides the following two commands: admin cleanup and admin recover. The former is suitable for situations where the index data is more than the data row data, and it will delete the excess index data. The latter is suitable for situations where the data row data is more than the index row data, and it will fill in the missing index data.
https://docs.pingcap.com/zh/tidb/stable/sql-statement-admin-checksum-table#语法图

| username: wuxiangdong | Original post link

Paranormal event, previously with MySQL, also encountered this. That table had a particularly large amount of data, and occasionally found a few records with inconsistent query results.

| username: 数据小黑 | Original post link

Are you saying that the results are inconsistent when using primary key queries versus non-primary key queries?