Panic Reported After TiCDC Connects to Kafka for a Period of Time

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 接入 kafka 后过一段时间报panic

| username: xxxxxxxx

Version: 4.0.13

After deploying TiCDC on an online TiDB cluster and creating tasks, a total of 18 tasks were created. 17 tasks can normally synchronize binlog to downstream Kafka, but one task did not find the corresponding binlog in Kafka.

Upon investigation, it was found that TiCDC keeps restarting.

  1. By using display to check the cluster status, the status of TiCDC keeps switching between down and up.
  2. Analyzing the TiCDC logs, it was found that cdc.log keeps logging, but they are all info and warn logs. The cdc_stderr.log keeps logging the following log segments.
goroutine 2359 [running]:
github.com/pingcap/ticdc/cdc/sink/codec.rowEventToMaxwellMessage(0xc0228f5900, 0x2fa3e80, 0xc020205960)
	github.com/pingcap/ticdc@/cdc/sink/codec/maxwell.go:105 +0xfae
github.com/pingcap/ticdc/cdc/sink/codec.(*MaxwellEventBatchEncoder).AppendRowChangedEvent(0xc020205960, 0xc0228f5900, 0x3, 0x2, 0x1)
	github.com/pingcap/ticdc@/cdc/sink/codec/maxwell.go:160 +0x2f
github.com/pingcap/ticdc/cdc/sink.(*mqSink).runWorker(0xc020ba0750, 0x2f70da0, 0xc020256800, 0x2, 0x0, 0x0)
	github.com/pingcap/ticdc@/cdc/sink/mq.go:351 +0x3c8
github.com/pingcap/ticdc/cdc/sink.(*mqSink).run.func1(0xc013f57768, 0x0)
	github.com/pingcap/ticdc@/cdc/sink/mq.go:281 +0x46
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc022794fc0, 0xc020c0b4e0)
	golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:54 +0x66
panic: interface conversion: interface {} is string, not []uint8

goroutine 603 [running]:
github.com/pingcap/ticdc/cdc/sink/codec.rowEventToMaxwellMessage(0xc008f16180, 0x2fa3e80, 0xc000346220)
	github.com/pingcap/ticdc@/cdc/sink/codec/maxwell.go:105 +0xfae
github.com/pingcap/ticdc/cdc/sink/codec.(*MaxwellEventBatchEncoder).AppendRowChangedEvent(0xc000346220, 0xc008f16180, 0x3, 0x2, 0x1)
	github.com/pingcap/ticdc@/cdc/sink/codec/maxwell.go:160 +0x2f
github.com/pingcap/ticdc/cdc/sink.(*mqSink).runWorker(0xc000e9c360, 0x2f70da0, 0xc0021b6300, 0x2, 0x0, 0x0)
	github.com/pingcap/ticdc@/cdc/sink/mq.go:351 +0x3c8
github.com/pingcap/ticdc/cdc/sink.(*mqSink).run.func1(0xc00216cf68, 0x0)
	github.com/pingcap/ticdc@/cdc/sink/mq.go:281 +0x46
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc000176660, 0xc00000c220)
	golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:54 +0x66
panic: interface conversion: interface {} is string, not []uint8

This issue looks like a bug. Currently, it can only be resolved by recreating the TiCDC task every few days. I would like to ask what could be the triggering conditions for this issue.

| username: 大飞哥online | Original post link

How many tasks are running on it?

Otherwise, just shut down this abnormal one and start a new one on another machine.

| username: Fly-bird | Original post link

Try closing and recreating this task. It’s possible that there are too many tasks and not enough resources. Check the resource utilization of CDC.

| username: 像风一样的男子 | Original post link

This is a bug in the old version. Check out this issue: json codec met panic: interface conversion: interface {} is string, not []uint8 · Issue #2758 · pingcap/tiflow · GitHub. Your TiDB version is too low; it’s best to upgrade it. Later versions have optimized many CDC-related issues.

| username: xxxxxxxx | Original post link

What are the conditions that trigger this bug? I mainly want to understand the triggering conditions and see if it can be avoided.

| username: 像风一样的男子 | Original post link

It’s unclear how it’s triggered. Since it’s a known bug, don’t struggle with it anymore. Just upgrade.

| username: 大飞哥online | Original post link

The upgrade schedule has been set.

| username: xxxxxxxx | Original post link

1.0->2.0->2.1->4.0, every time there’s a problem, they just tell us to upgrade. This kind of solution is really beyond words.

| username: 像风一样的男子 | Original post link

It’s normal. When there’s a problem with the software you develop, you also release upgrades and patches, right?

| username: ti-tiger | Original post link

This issue was caused by TiCDC encountering some unexpected data types when using the Maxwell format encoder, leading to a panic error. This problem has been fixed in TiCDC version 4.0.14.

| username: xfworld | Original post link

Minor version upgrades are fine, mainly fixing bugs and having little impact on other aspects.

However, before upgrading, it’s still best to back up your data before proceeding with the upgrade.

| username: Billmay表妹 | Original post link

4.0.16 is the latest version of 4.0.x, you can consider upgrading to this minor version first.

If 4.x can meet your needs, then you can upgrade to the minor version for debugging first, there’s no need to upgrade to 7.x all at once.

The purpose of releasing versions is to quickly find a solution to the problems you encounter.

| username: 喵父666 | Original post link

Version 4 is definitely a bit outdated. I suggest upgrading to a newer version.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.