Configuration Issues for TiCDC Sync to Kafka

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc同步到kafka配置问题

| username: xxxxxxxx

TiDB version 6.1.7

Issue: The downstream Kafka limits the network packet size to 4MB, but the ticdc configuration is as follows (one task for a single table), and it still triggers the error [Message was too large, server rejected it to avoid allocation error.]
“max-batch-size”: “4” (the official documentation states that this parameter is ineffective, the default value is 16, but even if it is 16, it should not report an error)
“max-message-bytes”: “262144”
“protocol”: “maxwell”

PS: Kafka max-message is limited to 4MB and cannot be modified, so only the ticdc configuration can be controlled. Previously encountered this issue with version 4.0.13, the official recommendation was to upgrade, then upgraded to 4.0.16 and still had the issue, now upgraded to 6.1.7 and still have the issue, feeling very frustrated.

| username: 芮芮是产品 | Original post link

You can change the configuration.

| username: Fly-bird | Original post link

This configuration item can be modified. What about your Kafka configuration?

Add these parameters to config/server.properties:

Maximum bytes a broker can receive for a message

message.max.bytes=200000000

Maximum bytes a broker can replicate for a message

replica.fetch.max.bytes=204857600

Maximum bytes a consumer can read for a message

fetch.message.max.bytes=204857600

| username: xxxxxxxx | Original post link

What I mean is that Kafka is someone else’s system and is not managed by the DBA. The DBA is just a client, and the server side does not support configuration changes as it is managed uniformly by others.

| username: ShawnYan | Original post link

You can build your own Kafka.

| username: 芮芮是产品 | Original post link

Your downstream Kafka is refusing to receive the package.

| username: RenlySir | Original post link

You should let downstream Kafka make modifications and add limit values.

| username: 像风一样的男子 | Original post link

Data is like water; if the downstream pipe becomes smaller, you either need to limit the CDC flow or increase the capacity of the downstream Kafka pipe.

| username: LingJin | Original post link

The protocol being used is Maxwell, which performs batch operations on multiple events. From the source code, it appears that max-batch-size is not effective for this protocol. Maxwell is not an officially GA protocol of TiCDC.

From the provided configuration, the value for max-message-bytes is 262144, which is 256k. If the batch message size exceeds this value, a “message too large” error will occur.

Since your downstream Kafka has a message size limit of 4MB, you can completely ignore this parameter, or set it to 4MB as well.

| username: xxxxxxxx | Original post link

I tried configuring only max-message-bytes=4MB, but it still reported an error, so now I don’t know how to configure it.

From various protocol tests, Maxwell’s readability is relatively good, so we finally decided on this protocol. Which protocol is officially GA? Which protocol is recommended? For scenarios where the downstream Kafka is limited to 4MB, how should the parameters on the TiCDC side be configured under this protocol?

| username: LingJin | Original post link

  1. Currently, the official recommendation is to use canal-json and open-protocol.
  2. If using canal-json, you can directly use the default parameter configuration.
  3. In canal-json, a single DML event corresponds to one Kafka message. In Open-Protocol, multiple DML events correspond to one Kafka message, and the max-batch-size parameter takes effect.
  4. You can directly use the default parameters.
| username: xxxxxxxx | Original post link

Sure, is there a version limitation for this? Does it apply to version 4.0.13 or 4.0.16?
We are currently in the testing phase, and if canal-json works in version 4.0.13, then we don’t need to upgrade for now.