After migrating TiDB, data insertion continuously reports 'for key PRIMARY'

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br 迁移 tidb后,写入数据一直报for key ‘PRIMARY’

| username: leoones

[Overview] Scenario + Problem Overview
Order Data
Migrating a database from a v5.4.0 cluster to a new v5.4.1 cluster using BR
Full and incremental migration completed, business traffic switched, data writes reporting large volumes of “for key ‘PRIMARY’”
TiDB backend continuously logs “for key ‘PRIMARY’”
[tikv:1205]Lock wait timeout exceeded; try restarting transaction"]

Logs:


Monitoring:


Table Structure:
CREATE TABLE item_trace (
id bigint(20) NOT NULL /*T![auto_rand] AUTO_RANDOM(4) */,
api_type varchar(20) NOT NULL DEFAULT ‘sp’,
asid bigint(20) unsigned NOT NULL DEFAULT ‘0’,
seller_id varchar(100) NOT NULL DEFAULT ‘’,
marketplace_id varchar(50) NOT NULL DEFAULT ‘’,
region char(2) NOT NULL DEFAULT ‘’ COMMENT ‘’,
amazon_order_id varchar(50) NOT NULL DEFAULT ‘’ COMMENT ‘’,
order_last_update_time int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
purchase_date int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
fulfillment_channel varchar(100) NOT NULL DEFAULT ‘’,
sales_channel varchar(100) NOT NULL DEFAULT ‘’,
next_sync_time int(10) NOT NULL DEFAULT ‘0’,
next_token varchar(2500) NOT NULL DEFAULT ‘’,
gmt_modified timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘Data update time’,
gmt_create timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘Data creation time’,
timezone varchar(20) NOT NULL COMMENT ‘Time zone’,
PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */,
UNIQUE KEY uk-order-id (seller_id,marketplace_id,amazon_order_id),
KEY idx_update_time (id,order_last_update_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![auto_rand_base] AUTO_RANDOM_BASE=403971164 */

Problem Points:
After business switch, traffic has been routed to the new cluster.
During batch writes, none of the services’ insert statements explicitly specify the primary key id, but all inserts report primary key id conflicts.
Is it possible that migrating data from v5.4.0 to v5.4.1 causes unspecified primary key id errors?

[Phenomenon] Business and Database Phenomenon
At the time, the business write traffic was very high.
Subsequently switching back to the original v5.4.0 did not result in “for key ‘PRIMARY’” errors.

[Business Impact]
Data write failures leading to a large number of transaction timeouts.
Increased data latency.

[TiDB Version]
v5.4.1

| username: xfworld | Original post link

What method is used for the primary key? Clustered index?

| username: leoones | Original post link

Yes, it is clustered_index.

| username: xfworld | Original post link

Is this error also occurring when directly performing batch inserts without syncing through CDC?

| username: xfworld | Original post link

What is the structure of this table? Can you share it?

| username: leoones | Original post link

CDC synchronization is too slow, delayed by several hours, and the checkpoint is not progressing.

| username: xfworld | Original post link

This is another question… :joy:

You should start a new thread, explain the background and some phenomena clearly, and it’s best to include the key logs as well.

| username: leoones | Original post link

The table structure has been updated.

| username: xfworld | Original post link

UNIQUE KEY uk-order-id (seller_id, marketplace_id, amazon_order_id),

You can check if this unique key has a conflict.

If the primary key is auto_random, a conflict is impossible…

| username: leoones | Original post link

If it is a unique constraint conflict, the error should be for key ‘uk-order-id’. After importing data with BR, should we refresh the value of auto_random to ensure that the IDs of the data inserted later are greater than the current max ID?

| username: xfworld | Original post link

auto_random is an enhanced version of the Snowflake algorithm, which will not cause conflicts even if NTP issues occur.

| username: leoones | Original post link

After importing the data, I checked the max_id of the existing data in this table, which is 8646911284956014231. The max_id generated after manually inserting a batch of data is 5764607523438206044. The IDs of the subsequently written data are smaller than the max_id of the existing data. Is it possible that the IDs generated later will cause duplication with the IDs of the existing data?

| username: xfworld | Original post link

I still recommend setting NTP consistently to reduce this possibility.

High-order swapping is to scatter and solve the problem of hotspot writing.

| username: leoones | Original post link

NTP is consistent between nodes.

| username: xfworld | Original post link

No problem then~ :grinning:

| username: neilshen | Original post link

After the full restoration is completed and before the incremental restoration, was there any writing to the new cluster? If there was, it would change the state of the downstream cluster, making it impossible to perform incremental restoration on it. This is because the new writes might use the same primary IDs as those in the incremental data, leading to conflicts in subsequent inserts.

| username: luancheng | Original post link

May I ask which version of BR you are using? It looks like a known bug where BR does not refresh the auto-random base after incremental recovery. This issue was fixed in BR v5.4.1 and later versions. br: incremental restore should rebase auto increment id even table exists; · Issue #33596 · pingcap/tidb · GitHub

| username: Hacker007 | Original post link

Generally, this key exists in the downstream table. It also appears on my side, but I’m not sure why it exists.

| username: leoones | Original post link

The new cluster has not written any data before the incremental.

| username: leoones | Original post link

The br version is v5.4.0
Release Version: v5.4.0
Git Commit Hash: 55f3b24c1c9f506bd652ef1d162283541e428872
Git Branch: heads/refs/tags/v5.4.0
Go Version: go1.16.4
UTC Build Time: 2022-01-25 08:36:34
Race Enabled: false