Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 如何打开这个新特性:TiCDC 同步数据到 Kafka,吞吐从 4000 行每秒提升到 35000 行每秒,复制延迟降低到 2 秒。
[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Phenomenon and Impact] After upgrading the cluster from v5.4.0 to v6.5.0, it was found that the performance of ticdc synchronizing tasks to downstream Kafka did not improve. When there are batch transactions involving large tables, it gets stuck, and synchronization to downstream Kafka becomes very slow.
I would like to ask, to improve this throughput, which parameters need to be enabled or configured? Thank you!
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
I referred to the official configuration but couldn’t find the relevant configuration parameter.
To enable the new feature of TiCDC synchronizing data to Kafka, you need to follow these steps:
- Ensure your TiCDC version is v4.0.9 or higher, as this new feature was introduced in this version.
- Follow the instructions in the official TiCDC documentation to create a changefeed that synchronizes data to Kafka. You can refer to Replicate Data to Kafka for detailed steps.
- When creating the changefeed, you can control the throughput of TiCDC synchronizing data to Kafka by configuring the
sink-uri
parameter. Specifically, you can add the following configuration items to the sink-uri
parameter:
kafka.producer.config.bootstrap.servers=<kafka-broker-list>
kafka.producer.config.max.request.size=<max-request-size>
Here, <kafka-broker-list>
is the list of brokers in your Kafka cluster, and <max-request-size>
is the maximum size of a single Kafka message. By appropriately adjusting these two parameters, you can increase the throughput of TiCDC synchronizing data to Kafka.
Note that if you encounter issues while using TiCDC to synchronize data to Kafka, you can refer to the Troubleshoot TiCDC section for troubleshooting.
Why is the default size of this parameter set to 10MB in version v6.5.0? Isn’t the larger the value, the higher the throughput?
Add the following to the configuration file:
kafka.producer.config.max.request.size=1048576
When updating the task again, an error occurs:
Error: component TiCDC changefeed's config file ./kafka-to-tianjin-pro-01.toml contained unknown configuration options: sink.kafka.producer.config.max.request.size
The high efficiency of the 6.5 Kafka sink is because previously a single sink writing to the downstream would slow down the entire pipeline. Now, by default, multiple concurrent writes are used. You can directly test the efficiency.
ticdc has made many performance optimizations for the Kafka sink in version 6.5, and these optimizations do not need to be enabled manually.
Regarding your error, it is because these parameters currently need to be written into the sink-uri, such as --sink-uri “kafka://127.0.0.1:9092/test?max-message-bytes=671088&protocol=canal-json”. Efforts are underway to support these parameters directly in the configuration file.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.