TICDC Sync Kafka Exception

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TICDC 同步 kafka 异常

| username: zyq16

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.0.6
【Issue Description】
An error occurs when writing to Kafka, then modifying the Kafka parameter max-message-bytes can restore it. When the same issue occurs again, modifying the Kafka max-message-bytes parameter is needed again. Changing the max-message-bytes parameter to either larger or smaller can restore synchronization.
【Error Message】
[2024/06/03 20:59:52.366 +08:00] [WARN] [json.go:404] [“Single message too large”] [max-message-size=10485880] [length=10560460] [table=db_name.table_name]
[2024/06/03 20:59:52.443 +08:00] [ERROR] [processor.go:305] [“error on running processor”] [capture=xx.xx.xx.xx:8300] [changefeed=task-xxx-xxx-kafka-v2] [error=“[CDC:ErrJSONCodecRowTooLarge]json codec single row too large”] [errorVerbose=“[CDC:ErrJSONCodecRowTooLarge]json codec single row too large\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:156\ngithub.com/pingcap/tiflow/cdc/sink/codec.(*JSONEventBatchEncoder).AppendRowChangedEvent\n\tgithub.com/pingcap/tiflow@/cdc/sink/codec/json.go:406\ngithub.com/pingcap/tiflow/cdc/sink.(*mqSink).runWorker\n\tgithub.com/pingcap/tiflow@/cdc/sink/mq.go:353\ngithub.com/pingcap/tiflow/cdc/sink.(*mqSink).run.func1\n\tgithub.com/pingcap/tiflow@/cdc/sink/mq.go:282\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357”]
[2024/06/03 20:59:52.443 +08:00] [ERROR] [processor.go:144] [“run processor failed”] [changefeed=task-xxx-xxx-kafka-v2] [capture=xx.xx.xx.xx:8300] [error=“[CDC:ErrJSONCodecRowTooLarge]json codec single row too large”] [errorVerbose=“[CDC:ErrJSONCodecRowTooLarge]json codec single row too large\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:156\ngithub.com/pingcap/tiflow/cdc/sink/codec.(*JSONEventBatchEncoder).AppendRowChangedEvent\n\tgithub.com/pingcap/tiflow@/cdc/sink/codec/json.go:406\ngithub.com/pingcap/tiflow/cdc/sink.(*mqSink).runWorker\n\tgithub.com/pingcap/tiflow@/cdc/sink/mq.go:353\ngithub.com/pingcap/tiflow/cdc/sink.(*mqSink).run.func1\n\tgithub.com/pingcap/tiflow@/cdc/sink/mq.go:282\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357”]

| username: flow-PingCAP | Original post link

Is there a limit on the row width of a table?

| username: zyq16 | Original post link

The table definition has two longtext and three text fields.

| username: LingJin | Original post link

The core of the issue is that the user did not configure the max-message-bytes parameter. In older versions, it defaults to 10MB. The size of the message you are syncing exceeds this value, so it naturally results in an error.

This is expected.

| username: LingJin | Original post link

What is your question?

| username: zyq16 | Original post link

The max-message-bytes parameter is configured in Kafka, and the user has also configured it for pushing to Kafka. The user’s configuration is the same as Kafka’s configuration.

| username: zyq16 | Original post link

Ticdc pushing to Kafka error [CDC:ErrJSONCodecRowTooLarge] JSON codec single row too large

| username: 友利奈绪 | Original post link

This error indicates that TiCDC encountered an issue where the JSON encoding of a single row of data was too large to be written to Kafka. This is usually due to the row containing too many fields or excessively large field values, exceeding Kafka’s message size limit.

To address this issue, you can consider the following solutions:

1. Check Table Structure Design

Examine the design of the relevant table, especially if VARCHAR, TEXT, or BLOB type fields contain excessively large data. If so, consider optimizing the data storage method, such as splitting large data into other storage locations and keeping references or summaries in the main table.

2. Reduce Fields

If the single row of data contains too many fields, consider whether all fields need to be synchronized to Kafka. Selectively synchronize necessary fields based on actual needs to reduce the size of the single row of data.

3. Use Avro Format

Consider using Avro format instead of JSON format for data encoding. Avro format is usually more compact and can reduce the size during data transmission.

4. Adjust CDC Configuration

In TiCDC’s configuration, you can adjust the JSON encoding parameters corresponding to the sink, such as the max-message-bytes parameter, to appropriately increase the size limit of a single message.

5. Increase Partitions

If the single row of data is large, consider increasing the number of partitions in the Kafka Topic to more finely divide the data, thereby reducing the pressure on a single partition.

6. Batch Processing

In TiCDC, you can set an appropriate data batching strategy to process large data in batches according to certain rules, avoiding issues caused by excessively large single rows of data.

7. Monitor Data Changes

Regularly monitor changes in table data, especially changes in field values, to promptly identify and address potential issues.