What is a better replacement for DUPLICATE KEY?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DUPLICATE KEY 用什么替换比较好

| username: tidb狂热爱好者

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

INSERT INTO
  `t_order` (
    `id`,
  )
VALUES
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...) ON DUPLICATE KEY
UPDATE
  `avail` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`avail`),
      `avail`
  ),
  `iso_avail` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`iso_avail`),
      `iso_avail`
  ),
  `iso_hold` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`iso_hold`),
      `iso_hold`
  ),
  `trd_hold` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`trd_hold`),
      `trd_hold`
  ),
  `non_trd_hold` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`non_trd_hold`),
      `non_trd_hold`
  ),
  `net_debt` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`net_debt`),
      `net_debt`
  ),
  `u_time` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`u_time`),
      `u_time`
  ),
  `state` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`state`),
      `state`
  ),
  `eq` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`eq`),
      `eq`
  ),
  `net_mgn` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`net_mgn`),
      `net_mgn`
  ),
  `order_lost` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`order_lost`),
      `order_lost`
  ),
  `asset_value` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`asset_value`),
      `asset_value`
  ),
  `debt` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`debt`),
      `debt`
  ),
  `mm` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`mm`),
      `mm`
  ),
  `upl` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`upl`),
      `upl`
  ),
  `version` = IF (
    VALUES
      (`version`) > `version`,
    VALUES
      (`version`),
      `version`
  )
| username: Kongdom | Original post link

:thinking: If it repeats without updating, use INSERT IGNORE INTO.

| username: residentevil | Original post link

If there is a UK in the table, you can consider using the replace into command.

| username: WalterWj | Original post link

Use the SELECT statement and WHERE clause to determine which rows should be updated.
Based on the results from the first step, construct an INSERT … ON DUPLICATE KEY UPDATE statement that includes only the rows that need to be updated.

| username: redgame | Original post link

The methods mentioned by the three above are all feasible. Another way is to handle it on the front end.

| username: TiDBer_aaO4sU46 | Original post link

“insert ignore into, this works, we’ve used it before.”

| username: Kongdom | Original post link

It seems that “REPLACE INTO” is not recommended due to performance issues.

| username: Jellybean | Original post link

In cases with relatively few conflicts, the write performance of REPLACE seems to be quite good, easily handling twenty to thirty thousand.

| username: Kongdom | Original post link

Yes, I remember searching before, there are issues in some cases, but overall it’s still pretty good.

| username: 随缘天空 | Original post link

You can use REPLACE INTO, that is, change the SQL to the following statement:

REPLACE INTO
  `t_order` (
    `id`,
  )
VALUES
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...),
  (...) 

However, I’m not sure about the performance. You can test it out.

| username: residentevil | Original post link

Is the “insert into on duplicate key xx” method feasible?

| username: zhanggame1 | Original post link

You can directly search for the differences on Baidu; it is consistent with MySQL.

| username: Kongdom | Original post link

Feasible, mainly depends on the scenario. We use all these methods.

| username: FutureDB | Original post link

The tidb mode of TiDB Lightning also uses replace for batch data import, with each batch generally around 1000 rows. However, compared to importing data into large tables, there is still a noticeable impact on CPU usage.

| username: FutureDB | Original post link

In concurrent and parallel scenarios, this approach requires attention to avoid conflicts. Once there are many conflicts, it is easy to encounter lock-related issues, which can significantly impact performance.