Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TiCDC CDC:ErrSnapshotLostByGC报错,gc-ttl配置 172800 ,gc_safe_point 一直再推进
【TiDB Usage Environment】Production
【TiDB Version】v5.4.0
【Encountered Problem】
An exception of oversized messages occurred at 3 AM on the 18th and was not handled in time.
After adjusting parameters around 2 PM, an error ErrSnapshotLostByGC was reported. The task could not continue.
[CDC:ErrSnapshotLostByGC] fail to create or maintain changefeed due to snapshot loss caused by GC. checkpoint-ts 434656888387272706 is earlier than or equal to GC safepoint at 434667331771695104"
【Documentation Description】
The downstream continued to be abnormal, and TiCDC failed multiple retries.
In this scenario, TiCDC will save task information. Since TiCDC has already set the service GC safepoint in PD, data after the synchronization task checkpoint will not be cleaned by TiKV GC within the effective period of gc-ttl.
gc-ttl: 172800
Why was the data GC’d so quickly, and are there any other parameters to control this?
【Reproduction Path】What operations were performed to cause the problem
【Problem Phenomenon and Impact】
【Attachments】
- Relevant logs, configuration files, Grafana monitoring (https://metricstool.pingcap.com/)
- TiUP Cluster information
- TiUP Cluster Edit config information
- TiDB-Overview monitoring
- Corresponding module Grafana monitoring (if any, such as BR, TiDB-binlog, TiCDC, etc.)
- Corresponding module logs (including logs one hour before and after the problem)
20220718TiCDC问题排查确认.txt (14.9 KB)
cdc0718.log.tar.gz (8.4 MB)
If the question is about performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.