[TiDB Usage Environment] Testing
[TiDB Version] 5.3.2
[Reproduction Path]
TiCDC performs one-way synchronization between two clusters
The source end has the tidb_super_read_only parameter enabled and is no longer readable or writable
But the ticdc_processor_checkpoint_ts_lag metric is still >1s
[Encountered Problem: Problem Phenomenon and Impact] The expectation was that after enabling read-only on the source end, the ticdc_processor_checkpoint_ts_lag would be 0 after synchronization is complete, allowing us to determine whether it is fully synchronized, i.e., RPO=0. However, in reality, it is impossible to make this determination.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
When both tidb_super_read_only and tidb_restricted_read_only are enabled on the source side, the ticdc_processor_checkpoint_ts_lag metric still exceeds 1 second and fluctuates within a small range. For example, in my current environment, the sampled values are 1.50, 2.19, and 2.15, and it has not become 0.
What is the purpose of enabling read-only mode upstream? To prevent using ticdc to synchronize data?
What is the downstream task connected to? Kafka, TiDB?
When checking the ticdc logs, has there been any new information synchronized after enabling it?
Have you tried versions after 6.2 to see if the same issue exists? Is there a situation where, before executing SQL statements, TiDB checks the cluster’s read-only flag? Starting from v6.2.0, TiDB also checks this flag before committing SQL statements to prevent long-running auto commit statements from modifying data after the server is set to read-only mode.
In other words, the lag is relative to the physical time on the PD leader.
In this situation, you don’t need to rely on lag to make a judgment. Simply check the changefeed checkpoint ts. Once the checkpoint timestamp exceeds the timestamp when read-only mode was enabled, it indicates that synchronization is complete.