TiCDC is extremely slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 速度超级慢

| username: 爱白话的晓辉

In the online environment with version 5.4.0, there are 6 TiKV nodes, 2 PD nodes, 2 TiDB nodes, and 1 CDC node. Normally, around 2000 records are written per second online, but the delay on the CDC side is extremely high, with only over 9000 records written in twenty minutes. Does anyone have any good methods to improve this? CDC can’t be this inefficient, right?

The creation statement is as follows:

tiup cdc cli changefeed create --pd=http://ali-bi-pd-prod001:2379 --changefeed-id="kafka-canal-json" --sink-uri="kafka://kafka_broker:9092/bi_prod_to_doris_from_tidb?kafka-version=2.2.0&protocol=canal-json"
| username: 爱白话的晓辉 | Original post link

Latency graph

| username: neilshen | Original post link

If convenient, use clinic to capture the cluster’s monitoring and logs for analysis. Also, confirm whether there are large transactions in the business, as TiCDC version 5.4 has performance issues when handling large transactions.

| username: 爱白话的晓辉 | Original post link

There are no large transactions, all are single-row commits, and the largest table has 100 columns.

| username: xiaour | Original post link

It seems that your machine’s IO load is too high. Check the monitoring data.

| username: alfred | Original post link

Monitor IO at the OS level to see if there are any bottlenecks.

| username: nongfushanquan | Original post link

Check if there are any issues downstream. In the Kafka scenario, your business load is still very small.

| username: forever | Original post link

Is it slow to send to Kafka or slow to consume from Kafka?

| username: 爱白话的晓辉 | Original post link

Slow to Kafka

| username: jansu-dev | Original post link

  1. Please use Clinic for monitoring and troubleshooting. Clinic usage guide → PingCAP Clinic 快速上手指南 | PingCAP 文档中心
  2. It seems that Kafka is not using partition-num, so we are not sure if the issue is on the Kafka side.
| username: 爱白话的晓辉 | Original post link

Ultimately, it was because the CDC in version 5.4.0 was incompatible with data containing special characters, causing continuous retries and the CDC service to keep restarting. Upgrading to version 5.4.2 resolved the issue. We also modified several Kafka configuration parameters, and now the performance is fine.

| username: Meditator | Original post link

Is there a source for this? Could you share it with us?

| username: yuqi1129 | Original post link

Me too, +1

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.