Slow Query on DUPLICATE KEY UPDATE

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ON DUPLICATE KEY UPDATE 慢查询

| username: TiDB_C罗

Why is a single insert so slow, and how can I troubleshoot the cause?

| username: zhanggame1 | Original post link

This is not an ordinary insertion either; all fields need to be compared.

| username: TiDB_C罗 | Original post link

The account_id is the primary key. I always understood that only account_id is compared. How can I verify if other fields are also being compared? If that’s the case, can I completely replace it with “replace into”?

| username: zhanggame1 | Original post link

Maybe I got it wrong, you probably don’t need to compare all the fields. Is that base a JSON type? JSON is relatively slow.

| username: ti-tiger | Original post link

The insert statement involves a primary key or unique key conflict, which requires TiDB to first check for conflicting records before performing the corresponding insert or update operation. This process may increase disk seek overhead, especially when the dataset is large or the primary key or unique key is not sequential.

| username: Kongdom | Original post link

Are there any read-write hotspots?

prefetch: 324.5ms

prefetch: The time taken to fetch the data needed to check for conflicts from TiKV. This step mainly involves sending BatchGet type RPC requests to TiKV to retrieve data.

get_snapshot_time: 324.1ms

| username: TiDB_C罗 | Original post link

From the traffic visualization, there are no read/write hotspots.

| username: TiDB_C罗 | Original post link

Why is it BatchGet? My understanding is that it should be Batch_Point_Get.

| username: TiDB_C罗 | Original post link

In MySQL, “REPLACE INTO” first deletes a row and then inserts a new one. In TiDB, it doesn’t need to delete; it can directly insert, thus avoiding the comparison operation.

| username: Kongdom | Original post link

I remember that this way of writing is somewhat better than using replace.

| username: Kongdom | Original post link

Take a look at this get_snapshot_time: 324.1ms

Is there write pressure?

| username: TiDB_C罗 | Original post link

The image is not visible. Please provide the text you need translated.

| username: Kongdom | Original post link

Then there are no good ideas.
Or you can verify the time and execution plan for these three scenarios:

  1. Data without primary key conflicts
  2. Data with primary key conflicts
  3. Data without primary key conflicts and without ON DUPLICATE KEY
| username: zhanggame1 | Original post link

If it’s a JSON field, the database still needs to perform validation.

| username: TiDB_C罗 | Original post link

What about replace?