Flink CDC TiDB Unable to Start from Latest Checkpoint Position

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: flink cdc tidb 无法从checkpoint最新位置启动

| username: 消息终结者

[TiDB Usage Environment] Production Environment

[TiDB Version] v6.5.2

[Encountered Problem: Phenomenon and Impact]
Flink CDC TiDB cannot start from checkpoint or savepoint state

  1. The environment used is Alibaba Cloud Flink SQL mode
  2. Checkpoint and savepoint are managed by Alibaba Cloud Flink and stored in OSS
  3. The source table in TiDB has about 50 million rows of data
  4. The connector used is flink-connector-tidb-cdc-2.4.1.jar

Phenomenon
When restarting the Flink task for the second time, using the latest state recovery (i.e., checkpoint), after starting, the data volume still starts from the earliest position

Checkpoint data also exists

After starting, it still seems to be processing full data

Alibaba Cloud support investigation:


Has anyone encountered the same problem?

| username: Jellybean | Original post link

Logically, a connector that supports real-time incremental data read and write should have the basic functionality to read data from the checkpoint. You can check whether the latest version of the connector supports restarting from the checkpoint and whether additional configuration is needed to enable it.

| username: 大飞哥online | Original post link

I feel like I’m being given the runaround :rofl:

| username: xfworld | Original post link

Switching to a different mode might be better:
Use the official ticdc component to connect downstream to Kafka, and then connect to Flink, as shown in the diagram below:

tidb → tikv → ticdc → kafka → flink → N

Data change events are temporarily stored in Kafka, and Flink can easily consume and process them from Kafka.
ticdc will synchronize with PD’s TSO and track change states. You only need to manage the communication pipeline between ticdc and Kafka.

You can refer to this…


PS: flink-connector-tidb-cdc is a work of the Apache Flink CDC community, not the TiDB community…
If possible, see if there is an opportunity to improve it in the future…

| username: 消息终结者 | Original post link

If we integrate TiCDC, it will consume too many resources for the company and won’t be adopted :rofl:

The flink-connector-tidb-cdc we are using is already the latest version.

| username: 消息终结者 | Original post link

I am already using the latest connector version. I tried using a small table as the source, and after the first run, the target table data was cleared. When I restarted the task based on CK for the second time, the target table data was complete.

| username: xfworld | Original post link

Let Alibaba Cloud provide you with the commercial version of flink-connector-tidb-cdc service… :smirk: :slightly_smiling_face: :stuck_out_tongue_closed_eyes:

| username: andone | Original post link

Passing the blame to each other.