Issues with CDC synchronization to Kafka changefeed affecting MySQL change feed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: cdc同步到kafka的changefeed出现问题,影响了mysql的change feed

| username: db_user

[TiDB Usage Environment] Production Environment / Testing / Poc Production Environment
[TiDB Version] v4.0.13

Background:
Upstream TIDB, using cdc to sync to MySQL, Kafka, then because Kafka service crashed, causing issues with the changefeed syncing to Kafka. Then used cdc cli to execute pause twice and resume once. The second pause did not change the status viewed in the list, no other operations were performed during this period, and then the changefeed syncing to MySQL experienced delays.

Solution:
After deleting the task syncing to Kafka, the sync returned to normal.

Monitoring:


You can see a significant decrease in the amount of get during the sync delay time.

cdc log

[2023/04/14 07:17:15.795 +00:00] [INFO] [owner.go:622] ["stale task status is not deleted, wait metadata cleaned to create new changefeed"] ["task status"="{\"tables\":{\"1306\":{\"start-ts\":440271347866402910,\"mark-table-id\":0},\"1462\":{\"start-ts\":440271347866402910,\"mark-table-id\":0},\"1995\":{\"start-ts\":440271347866402910,\"mark-table-id\":0},\"477\":{\"start-ts\":440271347866402910,\"mark-table-id\":0},\"517\":{\"start-ts\":440271347866402910,\"mark-table-id\":0},\"677\":{\"start-ts\":440271347866402910,\"mark-table-id\":0}},\"operation\":null,\"admin-job-type\":1}"] [changefeed=cdc-kafka]
[2023/04/14 07:17:15.879 +00:00] [ERROR] [kafka.go:227] ["close async client with error"] [error="kafka: Failed to deliver 9 messages."]
[2023/04/14 07:17:15.879 +00:00] [ERROR] [processor.go:1428] ["processor receives redundant error"] [error="[CDC:ErrAdminStopProcessor]stop processor by admin command"] [errorVerbose="[CDC:ErrAdminStopProcessor]stop processor by admin command\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:156\ngithub.com/pingcap/ticdc/cdc.(*oldProcessor).stop\n\tgithub.com/pingcap/ticdc@/cdc/processor.go:1312\ngithub.com/pingcap/ticdc/cdc.(*Capture).handleTaskEvent\n\tgithub.com/pingcap/ticdc@/cdc/capture.go:283\ngithub.com/pingcap/ticdc/cdc.(*Capture).Run\n\tgithub.com/pingcap/ticdc@/cdc/capture.go:211\ngithub.com/pingcap/ticdc/cdc.(*Server).run.func4\n\tgithub.com/pingcap/ticdc@/cdc/server.go:272\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357"]
[2023/04/14 07:17:15.962 +00:00] [INFO] [owner.go:290] ["Find new changefeed"] [info="{\"sink-uri\":\"***\",\"opts\":{\"max-message-bytes\":\"67108864\"},\"create-time\":\"2022-08-10T08:38:01.259768036Z\",\"start-ts\":435190675862716506,\"target-ts\":0,\"admin-job-type\":2,\"sort-engine\":\"unified\",\"config\":{\"case-sensitive\":true,\"enable-old-value\":true,\"force-replicate\":false,\"check-gc-safe-point\":true,\"filter\":{\"rules\":[],\"ignore-txn-start-ts\":null,\"ddl-allow-list\":null},\"mounter\":{\"worker-num\":16},\"sink\":{\"dispatchers\":null,\"protocol\":\"canal-json\"},\"cyclic-replication\":{\"enable\":false,\"replica-id\":0,\"filter-replica-ids\":null,\"id-buckets\":0,\"sync-ddl\":false},\"scheduler\":{\"type\":\"table-number\",\"polling-time\":-1}},\"state\":\"normal\",\"history\":[1681455243649],\"error\":null,\"sync-point-enabled\":false,\"sync-point-interval\":600000000000,\"creator-version\":\"v4.0.13\"}"] [changefeed=cdc-kafka] ["checkpoint ts"=440783380765474971]
[2023/04/14 07:17:16.074 +00:00] [INFO] [kafka.go:320] ["Starting kafka sarama producer ..."] [config="{\"PartitionNum\":1,\"ReplicationFactor\":1,\"Version\":\"2.6.2\",\"MaxMessageBytes\":67108864,\"Compression\":\"none\",\"ClientID\":\"\",\"Credential\":{\"ca-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"cert-allowed-cn\":null},\"TopicPreProcess\":true}"]
[2023/04/14 07:17:16.076 +00:00] [INFO] [region_range_lock.go:218] ["range locked"] [lockID=362] [regionID=128924] [startKey=6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006c0000000000fa] [endKey=6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006d0000000000fa] [checkpointTs=440783380765474971]
[2023/04/14 07:17:16.076 +00:00] [INFO] [client.go:814] ["creating new stream to store to send request"] [regionID=128924] [requestID=484592] [storeID=1] [addr=ip.ip:20160]
[2023/04/14 07:17:16.076 +00:00] [INFO] [region_range_lock.go:218] ["range locked"] [lockID=363] [regionID=128924] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=440783380765474971]
[2023/04/14 07:17:16.076 +00:00] [INFO] [client.go:814] ["creating new stream to store to send request"] [regionID=128924] [requestID=484593] [storeID=1] [addr=ip.ip:20160]
[2023/04/14 07:17:16.081 +00:00] [INFO] [puller.go:217] ["puller is initialized"] [duration=7.485094ms] [changefeed=] [tableID=-1] [spans="[\"[6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006c0000000000fa, 6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006d0000000000fa)\",\"[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)\"]"] [resolvedTs=440783380765474971]
[2023/04/14 07:17:16.539 +00:00] [WARN] [client.go:1335] ["drop resolved ts due to region feed stopped"] [regionID=157636] [requestID=236727] [addr=ip.ip:20160]
#大量drop resolved ts due to region feed stopped
[2023/04/14 07:18:16.298 +00:00] [INFO] [client.go:1150] ["stream to store closed"] [addr=ip.ip:20160] [storeID=1]
[2023/04/14 07:18:16.307 +00:00] [INFO] [client.go:1150] ["stream to store closed"] [addr=ip.ip:20160] [storeID=1]
[2023/04/14 07:18:16.855 +00:00] [WARN] [client.go:1335] ["drop resolved ts due to region feed stopped"] [regionID=157636] [requestID=209288] [addr=ip.ip:20160]
[2023/04/14 07:18:16.865 +00:00] [WARN] [client.go:1335] ["drop resolved ts due to region feed stopped"] [regionID=157636] [requestID=236727] [addr=ip.ip:20160]
[2023/04/14 07:18:17.885 +00:00] [WARN] [client.go:1335] ["drop resolved ts due to region feed stopped"] [regionID=157636] [requestID=236727] [addr=ip.ip:20160]
[2023/04/14 07:18:17.892 +00:00] [WARN] [client.go:1335] ["drop resolved ts due to region feed stopped"] [regionID=157636] [requestID=209288] [addr=ip.ip:20160]
[2023/04/14 07:18:18.301 +00:00] [WARN] [owner.go:660] ["create changefeed failed, retry later"] [changefeed=cdc-kafka] [error="[CDC:ErrKafkaNewSaramaProducer]kafka: client has run out of available brokers to talk to (Is your cluster reachable?)"] [errorVerbose="[CDC:ErrKafkaNewSaramaProducer]kafka: client has run out of available brokers to talk to (Is your cluster reachable?)\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByCause\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:279\ngithub.com/pingcap/ticdc/pkg/errors.WrapError\n\tgithub.com/pingcap/ticdc@/pkg/errors/helper.go:28\ngithub.com/pingcap/ticdc/cdc/sink/producer/kafka.NewKafkaSaramaProducer\n\tgithub.com/pingcap/ticdc@/cdc/sink/producer/kafka/kafka.go:330\ngithub.com/pingcap/ticdc/cdc/sink.newKafkaSaramaSink\n\tgithub.com/pingcap/ticdc@/cdc/sink/mq.go:477\ngithub.com/pingcap/ticdc/cdc/sink.init.1.func3\n\tgithub.com/pingcap/ticdc@/cdc/sink/sink.go:81\ngithub.com/pingcap/ticdc/cdc/sink.NewSink\n\tgithub.com/pingcap/ticdc@/cdc/sink/sink.go:113\ngithub.com/pingcap/ticdc/cdc.(*Owner).newChangeFeed\n\tgithub.com/pingcap/ticdc@/cdc/owner.go:433\ngithub.com/pingcap/ticdc/cdc.(*Owner).loadChangeFeeds\n\tgithub.com/pingcap/ticdc@/cdc/owner.go:634\ngithub.com/pingcap/ticdc/cdc.(*Owner).run\n\tgithub.com/pingcap/ticdc@/cdc/owner.go:1357\ngithub.com/pingcap/ticdc/cdc.(*Owner).Run\n\tgithub.com/pingcap/ticdc@/cdc/owner.go:1252\ngithub.com/pingcap/ticdc/cdc.(*Server).campaignOwnerLoop\n\tgithub.com/pingcap/ticdc@/cdc/server.go:177\ngithub.com/pingcap/ticdc/cdc.(*Server).run.func1\n\tgithub.com/pingcap/ticdc@/cdc/server.go:260\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357"]
[2023/04/14 07:18:18.343 +00:00] [INFO] [owner.go:290] ["Find new changefeed"] [info="{\"sink-uri\":\"***\",\"opts\":{\"max-message-bytes\":\"67108864\"},\"create-time\":\"2022-08-10T08:38:01.259768036Z\",\"start-ts\":435190675862716506,\"target-ts\":0,\"admin-job-type\":2,\"sort-engine\":\"unified\",\"config\":{\"case-sensitive\":true,\"enable-old-value\":true,\"force-replicate\":false,\"check-gc-safe-point\":true,\"filter\":{\"rules\":[],\"ignore-txn-start-ts\":null,\"ddl-allow-list\":null},\"mounter\":{\"worker-num\":16},\"sink\":{\"dispatchers\":null,\"protocol\":\"canal-json\"},\"cyclic-replication\":{\"enable\":false,\"replica-id\":0,\"filter-replica-ids\":null,\"id-buckets\":0,\"sync-ddl\":false},\"scheduler\":{\"type\":\"table-number\",\"polling-time\":-1}},\"state\":\"normal\",\"history\":[1681456698295],\"error\":{\"addr\":\"ip.ip:8300\",\"code\":\"CDC-owner-1001\",\"message\":\"[CDC:ErrKafkaNewSaramaProducer]kafka: client has run out of available brokers to talk to (Is your cluster reachable?)\"},\"sync-point-enabled\":false,\"sync-point-interval\":600000000000,\"creator-version\":\"v4.0.13\"}"] [changefeed=cdc-kafka] ["checkpoint ts"=440783380765474971]
[2023/04/14 07:18:18.447 +00:00] [INFO] [kafka.go:320] ["Starting kafka sarama producer ..."] [config="{\"PartitionNum\":1,\"ReplicationFactor\":1,\"Version\":\"2.6.2\",\"MaxMessageBytes\":67108864,\"Compression\":\"none\",\"ClientID\":\"\",\"Credential\":{\"ca-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"cert-allowed-cn\":null},\"TopicPreProcess\":true}"]
[2023/04/14 07:18:18.449 +00:00] [INFO] [region_range_lock.go:218] ["range locked"] [lockID=364] [regionID=128924] [startKey=6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006c0000000000fa] [endKey=6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006d0000000000fa] [checkpointTs=440783380765474971]
[2023/04/14 07:18:18.449 +00:00] [INFO] [client.go:814] ["creating new stream to store to send request"] [regionID=128924] [requestID=484596] [storeID=1] [addr=ip.ip:20160]
[2023/04/14 07:18:18.449 +00:00] [INFO] [region_range_lock.go:218] ["range locked"] [lockID=365] [regionID=128924] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=440783380765474971]
[2023/04/14 07:18:18.449 +00:00] [INFO] [client.go:814] ["creating new stream to store to send request"] [regionID=128924] [requestID=484597] [storeID=1] [addr=ip.ip:20160]
[2023/04/14 07:18:18.453 +00:00] [INFO] [puller.go:217] ["puller is initialized"] [duration=7.314801ms] [changefeed=] [tableID=-1] [spans="[\"[6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006c0000000000fa, 6d44444c4a6f624cff69ff737400000000ff0000f90000000000ff00006d0000000000fa)\",\"[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006
| username: neilshen | Original post link

It might be this one: mysql changefeed blocked by abnormal kafka changefeed · Issue #4241 · pingcap/tiflow · GitHub

In general, older versions of TiCDC have an issue where an exception in the downstream Kafka could cause the TiCDC node to get stuck, affecting other changefeeds. It is recommended to upgrade TiCDC to the latest version of 6.1 or 6.5.

| username: db_user | Original post link

Thank you, thank you. I have looked at this, but the logs are not very consistent, and it doesn’t seem to be the same issue. I wonder if there are any plans for CDC to become an independent component that won’t be affected by the cluster version. Otherwise, the communication difficulty during upgrades is still quite significant.

| username: neilshen | Original post link

Due to some technical limitations, TiCDC 6.1 currently supports a minimum of TiDB cluster version 5.1.0-alpha.

| username: db_user | Original post link

Alright, I’ll temporarily bypass it by using the delete task method. Thank you, teacher.

| username: db_user | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.