CDC Safe Mode

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: cdc安全模式

| username: TiDBer_20QjYTLl

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
v6.6.0
Regarding the safe mode of CDC, the documentation mentions

Block Quote
Before version v6.1.3, safe-mode was set to true by default, meaning all INSERT and UPDATE statements were converted to REPLACE INTO statements. From version v6.1.3 onwards, the system can automatically determine whether there is duplicate data downstream, and safe-mode is changed to false by default. When the system determines there is no duplicate data downstream, it will directly synchronize INSERT and UPDATE statements.

How does CDC determine if there is duplicate data downstream? Does it query the downstream based on the primary key? I don’t seem to see any judgment in the code.

| username: WalterWj | Original post link

Before using TiCDC, you need to have valid indexes.

| username: TiDBer_20QjYTLl | Original post link

Which index are you referring to? How do you determine if data is duplicated based on the index of the downstream database?

| username: WalterWj | Original post link

The official website has relevant explanations about effective indexes. TiCDC 简介 | PingCAP 文档中心

| username: TiDBer_20QjYTLl | Original post link

My question is how to determine primary key conflicts or unique index conflicts without using replace into. I looked at the code and it seems there is no logic to determine unique key conflicts after insertion.

| username: WalterWj | Original post link

Idempotence means directly executing replace.
6.1.3 Intuitively feels like conflict errors are retried by changing to replace or insert ignore.

| username: tidb菜鸟一只 | Original post link

Could it be a direct retry?