TiCDC latency is high, consuming a lot of CPU and memory

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 延迟 很大,占cpu,内存也很高

| username: suqingbin0315

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1.1
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
The upstream TiDB performed many update operations on several tables, causing CDC to spike in CPU, memory, and network usage, with significant delays. The logs show the following error:
error="[CDC:ErrEventFeedEventError]eventfeed returns event error: not_leader:<region_id:844026 leader:<id:5329197 store_id:17561 > > "] [sri={}]
Afterwards, CDC was stopped.
[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

Is the TiDB environment normal? Check the status of the region through Grafana…

| username: suqingbin0315 | Original post link

TiDB is functioning normally. After executing the update operation, the number of regions increased significantly.

| username: xfworld | Original post link

Okay, then you can check these metrics:

Reference documentation:

| username: suqingbin0315 | Original post link

  • CDC pending bytes in memory: The memory used by the CDC module in the TiKV node is not available.
    Captured region count is as follows:
| username: suqingbin0315 | Original post link

The image is not visible. Please provide the text you need translated.

| username: suqingbin0315 | Original post link

I have encountered the same problem. The solution is to add the following configuration to the mydumper section of the tidb-lightning.toml file:

[mydumper]
no-schema = true
| username: suqingbin0315 | Original post link

Many metrics cannot be found.

| username: suqingbin0315 | Original post link

After the update operation on the upstream TiDB around 4 PM on May 15th, both the leader and region increased significantly.

| username: xfworld | Original post link

You need to select a time, because the CDC stopped. It would be better to choose the period before the issue occurred.

| username: xfworld | Original post link

It’s been over 14 hours, clearly there’s an issue downstream, it can’t keep up…

From this timeline, it’s easy to observe the problem. What bad thing happened at 16:00 on May 15th? After that time, it started to lag behind… :upside_down_face:

| username: suqingbin0315 | Original post link

On May 15th at 16:00, many tables were updated, resulting in a significant increase in QPS. Consequently, CDC started consuming a lot of CPU, memory, and network resources. The downstream status appears normal.

| username: xfworld | Original post link

That is probably a large transaction, right?

CDC has a split identifier for handling large transactions, but it is only supported in version 6.2.x.


Starting from version 6.2, you can control whether TiCDC splits single-table transactions by configuring the sink URI parameter transaction-atomicity. Splitting transactions can significantly reduce the latency and memory consumption of MySQL sink synchronization for large transactions.

I just checked, and this feature is also supported in version 6.1.1.

| username: suqingbin0315 | Original post link

Is updating many tables with conditions like “where id = 111” considered a large transaction?

| username: xfworld | Original post link

However, I saw a correction of a feature in 6.1.3,

I guess the OOM is related to this configuration…

| username: xfworld | Original post link

Most of them are known and fixed bugs. If possible, consider upgrading to a minor version.

| username: nongfushanquan | Original post link

On May 15th at 16:00, a large number of region leader migrations in the upstream will cause the connections between TiCDC and the upstream TiKV to be re-established, resulting in incremental scans and increased latency. However, it is recommended to first upgrade CDC to version 6.1.6 and then check if the issue still occurs.

| username: suqingbin0315 | Original post link

Hello, may I ask if I can expand a v6.1.6 TiCDC in my current v6.1.1 cluster?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.