Using fink-tidb-cdc-connector, data synchronization from TiDB to TiDB stops halfway

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用fink-tidb-cdc-connector,tidb到tidb,数据同步一半就不同步了

| username: TiDBer_FYLY5ka1

I am working on a task to synchronize data from TiDB to TiDB, with a total of 280,000 records, using the flink-tidb-cdc-connector. After the task started, it successfully synchronized 200,000 records, but then it stopped synchronizing. The logs continuously output the following statements:

2022-12-08 14:56:31,904 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988733300739, regionId: 3495290
2022-12-08 14:56:31,905 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988733300739, regionId: 3495290
2022-12-08 14:56:32,868 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988982337537, regionId: 3495286
2022-12-08 14:56:32,869 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988982337537, regionId: 3495286
2022-12-08 14:56:32,869 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988982337537, regionId: 3495286
2022-12-08 14:56:32,905 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988995444739, regionId: 3495290
2022-12-08 14:56:32,905 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988995444739, regionId: 3495290
2022-12-08 14:56:32,905 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906988995444739, regionId: 3495290
2022-12-08 14:56:33,857 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989244481537, regionId: 3495286
2022-12-08 14:56:33,857 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989244481537, regionId: 3495286
2022-12-08 14:56:33,858 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989244481537, regionId: 3495286
2022-12-08 14:56:33,906 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989257588738, regionId: 3495290
2022-12-08 14:56:33,906 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989257588738, regionId: 3495290
2022-12-08 14:56:33,908 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989257588738, regionId: 3495290
2022-12-08 14:56:34,870 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989506625537, regionId: 3495286
2022-12-08 14:56:34,870 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989506625537, regionId: 3495286
2022-12-08 14:56:34,870 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989506625537, regionId: 3495286
2022-12-08 14:56:34,909 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989519732737, regionId: 3495290
2022-12-08 14:56:34,909 INFO org.tikv.cdc.CDCClient - handle resolvedTs: 437906989519732737, regionId: 3495290

| username: Billmay表妹 | Original post link

【TiDB Version】
【Flink Version】
【Flink CDC Version】
Please post them.

| username: TiDBer_FYLY5ka1 | Original post link

tidb v5.3.0
flink-sql-connector-tidb-cdc 2.2.1

| username: TiDBer_FYLY5ka1 | Original post link

Flink 1.14.3

| username: Billmay表妹 | Original post link

Refer to this post, you can try upgrading to version 2.3.0 first~

| username: Billmay表妹 | Original post link

  1. You can ask about this error in the Flink CDC community. There are several unresolved bugs on the TiDB side.
  2. Are there any anomalies in the ticdc logs? You can post them here.
| username: TiDBer_FYLY5ka1 | Original post link

There are no exceptions in the log. It’s just that the existing data cannot be fully synchronized, while the incremental data is normal. I tried using version 2.3.0 to resynchronize the full data, and now it’s normal.

| username: TiDBer_FYLY5ka1 | Original post link

Thank you :blush:

| username: Billmay表妹 | Original post link

Okay, that might have been caused by a bug in the older version. Let us know if you encounter any other issues~

| username: TiDBer_FYLY5ka1 | Original post link

I tried to synchronize a larger table (TiDB to TiDB, tens of millions of rows) and encountered the following warning:
2022-12-08 17:12:32,556 INFO com.ververica.cdc.connectors.tidb.TiKVRichParallelSourceFunction - read snapshot events
2022-12-08 17:12:32,860 WARN org.tikv.common.region.StoreHealthyChecker - store [127.0.0.1:3930] is not reachable
2022-12-08 17:20:55,822 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - The heartbeat of JobManager with id ae84acffcbb8019922f9dfbfa758b959 timed out.

Then the task stopped:
FlinkException: Disconnect from JobManager responsible for 37783358c880142452cfdc23a76b97c4.

at org.apache.flink.runtime.taskexecutor.TaskExecutor.disconnectJobManagerConnection(TaskExecutor.java:1654)

at org.apache.flink.runtime.taskexecutor.TaskExecutor.disconnectAndTryReconnectToJobManager(TaskExecutor.java:1218)

at org.apache.flink.runtime.taskexecutor.TaskExecutor.access$3900(TaskExecutor.java:183)

at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobManagerHeartbeatListener.lambda$handleJobManagerConnectionLoss$0(TaskExecutor.java:2387)

at java.util.Optional.ifPresent(Optional.java:159)

at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobManagerHeartbeatListener.handleJobManagerConnectionLoss(TaskExecutor.java:2385)

at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobManagerHeartbeatListener.notifyHeartbeatTimeout(TaskExecutor.java:2368)

at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:155)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:455)

at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)

at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:455)

at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213)

at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)

at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)

at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)

at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)

at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)

at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)

at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)

at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)

at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)

at akka.actor.Actor.aroundReceive(Actor.scala:537)

at akka.actor.Actor.aroundReceive$(Actor.scala:535)

at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)

at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)

at akka.actor.ActorCell.invoke(ActorCell.scala:548)

at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)

at akka.dispatch.Mailbox.run(Mailbox.scala:231)

at akka.dispatch.Mailbox.exec(Mailbox.scala:243)

at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)

at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)

at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Caused by: java.util.concurrent.TimeoutException: The heartbeat of JobManager with id ae84acffcbb8019922f9dfbfa758b959 timed out.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.