Why is the incremental scan data pushed by TiCDC inconsistent when the same startTs is chosen?

translator_bot · June 21, 2024, 6:56pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 为什么TiCDC选择同样的startTs，推送的增量扫数据是不一致的

| username: 迷人的Ti

[TiDB Usage Environment] Poc
[TiDB Version] 5.4.3
[Reproduction Path] After I sent CDC requests with the same startTs to PD using the tikv client java, the incremental scan data pushed each time was inconsistent.
[Encountered Problem: Phenomenon and Impact]
After I sent CDC requests with the same startTs to PD using the tikv client java, the incremental scan data pushed each time was inconsistent, and the CDC is within the memory range.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]

translator_bot · June 21, 2024, 6:56pm

| username: Fly-bird | Original post link

Have you tried manually pushing CDC to see if the data increments are consistent?

translator_bot · June 21, 2024, 6:56pm

| username: Jellybean | Original post link

Different operations before and after, data must ensure that GC is not triggered during synchronization, otherwise data will be cleaned up.

When troubleshooting, it is recommended to prioritize checking the GC situation. You can adjust the GC time of the cluster to ensure that the data retention period is long enough, and then verify.

translator_bot · June 21, 2024, 6:56pm

| username: cassblanca | Original post link

Pay attention to GC Life Time

translator_bot · June 21, 2024, 6:56pm

| username: 迷人的Ti | Original post link

After accumulating 1 million change events, I stopped all changes and then set the same startTs to fetch CDC events. I found that most of the pushed data was consistent, but a small portion was inconsistent. Consuming this data then led to data inconsistency.

translator_bot · June 21, 2024, 6:56pm

| username: 迷人的Ti | Original post link

Does tikv-client-java support adjusting the GC time? Or does the GC time need to be adjusted during deployment?

translator_bot · June 21, 2024, 6:56pm

| username: 迷人的Ti | Original post link

As far as I know, the default size of CDC events stored in memory by TiKV is 512MB. The backlog of my data does not exceed 512MB, but it is still inconsistent. Is this because a GC occurred in the middle? If a GC occurs, wouldn’t the data scanned incrementally be highly inconsistent?

translator_bot · June 21, 2024, 6:56pm

| username: 迷人的Ti | Original post link

Thank you very much for the guidance, expert.

translator_bot · June 21, 2024, 6:56pm

| username: Jellybean | Original post link

The situation you encountered may not necessarily be caused by GC. I just provided a troubleshooting idea, and you can first confirm if it is the problem.

GC is generally controlled by time, not by data size.

Based on experience, many issues with TiCDC are related to the GC safepoint. You can check the official website for more details.

translator_bot · June 21, 2024, 6:56pm

| username: 迷人的Ti | Original post link

Okay, thanks again for the guidance. I’ll try adjusting the GC time first and then run a few more tests to see.

translator_bot · June 21, 2024, 6:56pm

| username: 迷人的Ti | Original post link

It doesn’t seem likely to be a GC issue for the following reasons:
I use the TiKV client in Java to send CDC requests with startTs to PD.
The actual changes involve data with handles 1, 2, 3, and 4.
The incremental scan data pushed in the first request includes:
handle: 1
handle: 2
handle: 4
The incremental scan data pushed in the second request includes:
handle: 1
handle: 3
handle: 4
As shown in the example above, each incremental scan data push loses some data. If it were due to GC, the second push shouldn’t include the same data again.
As a novice, I humbly seek advice. Is this a bug or something else? How should I troubleshoot this?
Thank you very much for the guidance from all the experts!