Differences in Using TiCDC to Execute REPLACE Statements in v5.3.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: v5.3.0中使用ticdc同步执行replace语句的差异

| username: du拉松

[TiDB Usage Environment] Production
[TiDB Version] v5.3.0
[Reproduction Path] Using ticdc to synchronize data to Kafka, when executing the replace statement, if the corresponding data does not exist, it will synchronize the insert event data. If the corresponding data exists, it will first synchronize the delete event data and then the insert event data.
However, this is not the strategy for v4.0.16 and v6.5.0. When using these two versions, executing the replace statement will synchronize the insert event data if the corresponding data does not exist, and synchronize the update event statement if the corresponding data exists.
The protocol used is TiCDC Open Protocol.
Question: Is the cause of this difference a bug in v5.3.0? Or is it a strategy specific to this version that requires modification of related configurations? Testing shows that v5.4.3 also has related issues.
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]

| username: redgame | Original post link

No environment, difficult to test, feels like an Easter egg.

| username: du拉松 | Original post link

Hmm, there’s an even bigger surprise. Our testing department deployed v6.5.0 using a standalone deployment method, while I am currently testing a simple cluster deployment. Using the same database name, same table structure, same Kafka version, and the same TOML configuration file for creating changefeed, the results are different. :sweat_smile: The results are as follows:

Standalone version: When executing a replace statement, if the data does not exist, it will sync the insert event data; if the corresponding data exists, it will sync the update event data.
Simple cluster: When executing a replace statement, if the data does not exist, it will sync the insert event data; if the corresponding data exists, it will first sync the delete event data and then sync the insert event data.
I have looked many times but couldn’t find the reason.

| username: 有猫万事足 | Original post link

4.0.16 and 6.5.0 are updated versions with better approaches.

In older versions, performing delete and then insert operations under high concurrency can lead to deadlocks.

Note the following statement in the above link:

* Run sql statements in two transactions:
T1: begin;
T2: begin;
T1: delete from test.order_line where ol_o_id=7220 and ol_d_id=4 and ol_w_id=68 and ol_number = 7;
T2: delete from test.order_line where ol_o_id=7221 and ol_d_id=2 and ol_w_id=58 and ol_number = 4;
T1: REPLACE INTO test.order_line(ol_o_id,ol_d_id,ol_w_id,ol_number,ol_i_id,ol_supply_w_id,ol_delivery_d,ol_quantity,ol_amount,ol_dist_info) VALUES (7219,2,58,4,52661,58,'2021-11-13 02:38:33',5,'185.65',_binary'ZFQZYPNYPLZHNNBWPEIWTSPS');
T2: REPLACE INTO test.order_line(ol_o_id,ol_d_id,ol_w_id,ol_number,ol_i_id,ol_supply_w_id,ol_delivery_d,ol_quantity,ol_amount,ol_dist_info) VALUES (7218,4,58,1,23862,58,'2021-11-13 02:38:33',7,'408.94',_binary'QTCLICHWFMNOODRLLIBOOTUQ');
* At this time, we get error from T2:
Deadlock found when trying to get lock; try restarting transaction

2. Why does this happen?

* In RR isolation level, a gap lock is taken when we delete a nonexistent record. Because two gap locks in two transactions on the same gap don't block each other, so the two delete statements can successfully executed. (Reference [gap lock](https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-gap-locks))
* `REPLACE INTO` statement in T1 tries to insert a record and it tries to take an insert intention lock. And because a gap lock is taken by T2, this `REPLACE INTO` statement is blocked.
* `REPLACE INTO` statement in T2 also tries to insert a record into the same gap as T1, and it also waits to obtain an insert intention lock.
* Circular waiting happens and dead lock occurs.

Using the new versions instead of the old ones will solve the problem.

| username: du拉松 | Original post link

Yes, in the current cluster environment version v6.5.0, it still includes delete statements, while in the single-machine environment, it does not include delete statements. :rofl: Please see my reply for clarification.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.