Why is the incremental scan data pushed by TiCDC inconsistent when the same startTs is chosen?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 为什么TiCDC选择同样的startTs,推送的增量扫数据是不一致的

| username: 迷人的Ti

[TiDB Usage Environment] Poc
[TiDB Version] 5.4.3
[Reproduction Path] After I sent CDC requests with the same startTs to PD using the tikv client java, the incremental scan data pushed each time was inconsistent.
[Encountered Problem: Phenomenon and Impact]
After I sent CDC requests with the same startTs to PD using the tikv client java, the incremental scan data pushed each time was inconsistent, and the CDC is within the memory range.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]

| username: Fly-bird | Original post link

Have you tried manually pushing CDC to see if the data increments are consistent?

| username: Jellybean | Original post link

Different operations before and after, data must ensure that GC is not triggered during synchronization, otherwise data will be cleaned up.

When troubleshooting, it is recommended to prioritize checking the GC situation. You can adjust the GC time of the cluster to ensure that the data retention period is long enough, and then verify.

| username: cassblanca | Original post link

Pay attention to GC Life Time

| username: 迷人的Ti | Original post link

After accumulating 1 million change events, I stopped all changes and then set the same startTs to fetch CDC events. I found that most of the pushed data was consistent, but a small portion was inconsistent. Consuming this data then led to data inconsistency.

| username: 迷人的Ti | Original post link

Does tikv-client-java support adjusting the GC time? Or does the GC time need to be adjusted during deployment?

| username: 迷人的Ti | Original post link

As far as I know, the default size of CDC events stored in memory by TiKV is 512MB. The backlog of my data does not exceed 512MB, but it is still inconsistent. Is this because a GC occurred in the middle? If a GC occurs, wouldn’t the data scanned incrementally be highly inconsistent?

| username: 迷人的Ti | Original post link

Thank you very much for the guidance, expert.

| username: Jellybean | Original post link

The situation you encountered may not necessarily be caused by GC. I just provided a troubleshooting idea, and you can first confirm if it is the problem.

GC is generally controlled by time, not by data size.

Based on experience, many issues with TiCDC are related to the GC safepoint. You can check the official website for more details.

| username: 迷人的Ti | Original post link

Okay, thanks again for the guidance. I’ll try adjusting the GC time first and then run a few more tests to see.

| username: 迷人的Ti | Original post link

It doesn’t seem likely to be a GC issue for the following reasons:
I use the TiKV client in Java to send CDC requests with startTs to PD.
The actual changes involve data with handles 1, 2, 3, and 4.
The incremental scan data pushed in the first request includes:
handle: 1
handle: 2
handle: 4
The incremental scan data pushed in the second request includes:
handle: 1
handle: 3
handle: 4
As shown in the example above, each incremental scan data push loses some data. If it were due to GC, the second push shouldn’t include the same data again.
As a novice, I humbly seek advice. Is this a bug or something else? How should I troubleshoot this?
Thank you very much for the guidance from all the experts!