After upgrading to v6.1.5, TiCDC frequently restarts

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 升级v6.1.5后,ticdc频繁重启

| username: wwb519

[TiDB Usage Environment] Production Environment / Testing / Poc
Production Environment
[TiDB Version]
V6.1.5
[Reproduction Path] What operations were performed when the issue occurred
Upgraded from version V4.0.15 to V6.1.5
[Encountered Issue: Problem Phenomenon and Impact]
ticdc process frequently restarts.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachments: Screenshots/Logs/Monitoring]

cdc1.zip (3.6 MB)

| username: Billmay表妹 | Original post link

Post the error message~

Just share your resource configuration.

| username: wwb519 | Original post link

[2023/05/15 22:12:29.282 +08:00] [WARN] [client.go:171] [“peer message client detected error, restarting”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.110.45.12:8300: connect: connection refused"”] [errorVerbose=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.110.45.12:8300: connect: connection refused"\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/juju_adaptor.go:15\ngithub.com/pingcap/tiflow/pkg/p2p.(*MessageClient).launchStream\n\tgithub.com/pingcap/tiflow/pkg/p2p/client.go:187\ngithub.com/pingcap/tiflow/pkg/p2p.(*MessageClient).Run\n\tgithub.com/pingcap/tiflow/pkg/p2p/client.go:166\ngithub.com/pingcap/tiflow/pkg/p2p.(*messageRouterImpl).GetClient.func1\n\tgithub.com/pingcap/tiflow/pkg/p2p/message_router.go:144\nruntime.goexit\n\truntime/asm_amd64.s:1594”]

| username: wwb519 | Original post link

| username: weixiaobing | Original post link

It seems like there is an issue with your CDC. Can it be accessed? transport: Error while dialing dial tcp 10.110.45.12:8300: connect: connection refused

| username: wwb519 | Original post link

TiCDC is synchronized to MySQL and Kafka. After stopping the process that synchronizes to Kafka, it returned to normal.

| username: wwb519 | Original post link

The default value of tidb_enable_clustered_index is INT_ONLY, which means that the clustered index is only enabled for tables with integer primary keys. If you want to enable the clustered index for all tables, you can set it to ON.

| username: wwb519 | Original post link

The CDC logs do not show specific error messages, but the CDC process keeps restarting.

There are no specific error logs in the CDC logs. According to the operating system logs, the CDC process keeps restarting.

| username: weixiaobing | Original post link

Is there no obvious error log in the cdc.log before the restart?

| username: wwb519 | Original post link

No, I have uploaded cdc.log. There are error logs in cdc_stderr.log.
cdc1.zip (3.6 MB)

[tidb@cnhw-vm-ticdc-am01 log]$ tail -f cdc_stderr.log
github.com/pingcap/tiflow/cdc/sink/mq/codec.(*MaxwellEventBatchEncoder).AppendRowChangedEvent(0xc004ee9cc8, {0xc0079cde94?, 0x0?}, {0x0?, 0xc0079cde68?}, 0x1?)
github.com/pingcap/tiflow/cdc/sink/mq/codec/maxwell.go:172 +0x25
github.com/pingcap/tiflow/cdc/sink/mq/codec.(*encoderGroup).runEncoder(0xc008c3de00, {0x3abadc8, 0xc019b96240}, 0x3)
github.com/pingcap/tiflow/cdc/sink/mq/codec/encoder_group.go:114 +0x35d
github.com/pingcap/tiflow/cdc/sink/mq/codec.(*encoderGroup).Run.func2()
github.com/pingcap/tiflow/cdc/sink/mq/codec/encoder_group.go:93 +0x2c
The Go Programming Language
golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:75 +0x64
created by The Go Programming Language
golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:72 +0xa5

| username: Min_Chen | Original post link

There are no obvious errors in cdc.log, it just restarted directly. There should be specific errors in stderr. Please send the complete cdc_stderr.log. Thanks.

| username: wwb519 | Original post link

cdc_stderr.log (1.9 MB)

| username: wwb519 | Original post link

The error message is the same as the one in json codec met panic: interface conversion: interface {} is string, not []uint8 · Issue #2758 · pingcap/tiflow · GitHub. Has this bug reoccurred in version V6.1.5?