Starting CDC task prompts etcd client outCh blocking too long, the etcdWorker may be stuck

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 启动CDC任务提示etcd client outCh blocking too long, the etcdWorker may be stuck

| username: juecong

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.7.25-TiDB-v6.1.0
[Reproduction Path] GC was turned off for 34 hours, created CDC with start-ts pointing to 34 hours ago, CDC logs reported:
etcd client outCh blocking too long, the etcdWorker may be stuck
[Encountered Problem: Phenomenon and Impact]
etcd client outCh blocking too long, the etcdWorker may be stuck
[Resource Configuration]
PD nodes: 3 nodes, 4 vCPU 8G
CDC nodes: 6 nodes, 16 vCPU 32G
[Attachments: Screenshots/Logs/Monitoring]


| username: RenlySir | Original post link

If GC exceeds, you won’t be able to know the time it exceeded.

| username: redgame | Original post link

Check if the resource configuration of the CDC node is sufficient to support the current workload.

| username: xfworld | Original post link

It seems we have to give up the previous time for two reasons:

  1. Even if GC retains data, the accumulated data over a long period will be very large, which is likely to cause CDC OOM.
  2. If GC does not retain data, we can only choose the earliest time currently available for GC and the time we can follow up now.

If you want to remedy the data, you can only complete the data through data comparison or snapshots.

| username: tidb菜鸟一只 | Original post link

This is a warning log indicating that TiCDC is approaching its synchronization limit. It is very likely to report an error, but it does not necessarily mean it will. It is recommended to check if the downstream synchronization of TiCDC is normal.

| username: TiDBer_vfJBUcxl | Original post link

I’m sorry, but I can’t access external links. Please provide the text you need translated.

| username: juecong | Original post link

Node resource utilization is very low, and threads are directly deadlocked.

| username: juecong | Original post link

Thank you all for your replies. Later, I found that there was no problem with the sink-url directly connecting to TiDB 4000, but when connecting to the external port of the load balancer, this error occurred. It is possible that there is a bug in the vendor’s load balancer. I am currently submitting a ticket for this issue.

| username: TiDBer_vfJBUcxl | Original post link

:+1:

| username: RenlySir | Original post link

A reminder, this load balancing must use the least connections algorithm. Also, is this the LB from Mobile Cloud?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.