[TiDB Usage Environment] Poc
[TiDB Version] 5.4.3
[Reproduction Path] After I sent CDC requests with the same startTs to PD using the tikv client java, the incremental scan data pushed each time was inconsistent.
[Encountered Problem: Phenomenon and Impact]
After I sent CDC requests with the same startTs to PD using the tikv client java, the incremental scan data pushed each time was inconsistent, and the CDC is within the memory range.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]
Different operations before and after, data must ensure that GC is not triggered during synchronization, otherwise data will be cleaned up.
When troubleshooting, it is recommended to prioritize checking the GC situation. You can adjust the GC time of the cluster to ensure that the data retention period is long enough, and then verify.
After accumulating 1 million change events, I stopped all changes and then set the same startTs to fetch CDC events. I found that most of the pushed data was consistent, but a small portion was inconsistent. Consuming this data then led to data inconsistency.
As far as I know, the default size of CDC events stored in memory by TiKV is 512MB. The backlog of my data does not exceed 512MB, but it is still inconsistent. Is this because a GC occurred in the middle? If a GC occurs, wouldn’t the data scanned incrementally be highly inconsistent?
The situation you encountered may not necessarily be caused by GC. I just provided a troubleshooting idea, and you can first confirm if it is the problem.
GC is generally controlled by time, not by data size.
Based on experience, many issues with TiCDC are related to the GC safepoint. You can check the official website for more details.
It doesn’t seem likely to be a GC issue for the following reasons:
I use the TiKV client in Java to send CDC requests with startTs to PD.
The actual changes involve data with handles 1, 2, 3, and 4.
The incremental scan data pushed in the first request includes:
handle: 1
handle: 2
handle: 4
The incremental scan data pushed in the second request includes:
handle: 1
handle: 3
handle: 4
As shown in the example above, each incremental scan data push loses some data. If it were due to GC, the second push shouldn’t include the same data again.
As a novice, I humbly seek advice. Is this a bug or something else? How should I troubleshoot this?
Thank you very much for the guidance from all the experts!