Issues with the Algorithm for the TICDC ticdc_processor_checkpoint_ts_lag Metric

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TICDC ticdc_processor_checkpoint_ts_lag指标算法问题

| username: TiDBer_Bj7TDCTL

[TiDB Usage Environment] Testing
[TiDB Version] 5.3.2
[Reproduction Path]
TiCDC performs one-way synchronization between two clusters

The source end has the tidb_super_read_only parameter enabled and is no longer readable or writable
But the ticdc_processor_checkpoint_ts_lag metric is still >1s

[Encountered Problem: Problem Phenomenon and Impact] The expectation was that after enabling read-only on the source end, the ticdc_processor_checkpoint_ts_lag would be 0 after synchronization is complete, allowing us to determine whether it is fully synchronized, i.e., RPO=0. However, in reality, it is impossible to make this determination.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

image

| username: Billmay表妹 | Original post link

Refer to 系统变量 | PingCAP 文档中心

| username: TiDBer_Bj7TDCTL | Original post link

When both tidb_super_read_only and tidb_restricted_read_only are enabled on the source side, the ticdc_processor_checkpoint_ts_lag metric still exceeds 1 second and fluctuates within a small range. For example, in my current environment, the sampled values are 1.50, 2.19, and 2.15, and it has not become 0.

| username: yilong | Original post link

  1. What is the purpose of enabling read-only mode upstream? To prevent using ticdc to synchronize data?
  2. What is the downstream task connected to? Kafka, TiDB?
  3. When checking the ticdc logs, has there been any new information synchronized after enabling it?
  4. Have you tried versions after 6.2 to see if the same issue exists? Is there a situation where, before executing SQL statements, TiDB checks the cluster’s read-only flag? Starting from v6.2.0, TiDB also checks this flag before committing SQL statements to prevent long-running auto commit statements from modifying data after the server is set to read-only mode.
| username: TiDBer_Bj7TDCTL | Original post link

  1. The purpose of enabling read-only mode is to make the RPO zero, used for disaster recovery drills between two TiDB clusters.
  2. The downstream task connects to TiDB.
  3. After enabling read-only mode, the logs were not checked for new synchronization information.
  4. Currently using version 5.3.2, not using 6.2.
| username: neilshen | Original post link

The processor lag is never zero because of the way it is calculated, as follows:

processor_lag = pd_leader_current_ts - ticdc_current_processor_checkpoint_ts

In other words, the lag is relative to the physical time on the PD leader.


In this situation, you don’t need to rely on lag to make a judgment. Simply check the changefeed checkpoint ts. Once the checkpoint timestamp exceeds the timestamp when read-only mode was enabled, it indicates that synchronization is complete.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.