Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 启动CDC任务提示etcd client outCh blocking too long, the etcdWorker may be stuck
[TiDB Usage Environment] Production Environment
[TiDB Version] 5.7.25-TiDB-v6.1.0
[Reproduction Path] GC was turned off for 34 hours, created CDC with start-ts pointing to 34 hours ago, CDC logs reported:
etcd client outCh blocking too long, the etcdWorker may be stuck
[Encountered Problem: Phenomenon and Impact]
etcd client outCh blocking too long, the etcdWorker may be stuck
[Resource Configuration]
PD nodes: 3 nodes, 4 vCPU 8G
CDC nodes: 6 nodes, 16 vCPU 32G
[Attachments: Screenshots/Logs/Monitoring]
If GC exceeds, you won’t be able to know the time it exceeded.
Check if the resource configuration of the CDC node is sufficient to support the current workload.
It seems we have to give up the previous time for two reasons:
- Even if GC retains data, the accumulated data over a long period will be very large, which is likely to cause CDC OOM.
- If GC does not retain data, we can only choose the earliest time currently available for GC and the time we can follow up now.
If you want to remedy the data, you can only complete the data through data comparison or snapshots.
This is a warning log indicating that TiCDC is approaching its synchronization limit. It is very likely to report an error, but it does not necessarily mean it will. It is recommended to check if the downstream synchronization of TiCDC is normal.
I’m sorry, but I can’t access external links. Please provide the text you need translated.
Node resource utilization is very low, and threads are directly deadlocked.
Thank you all for your replies. Later, I found that there was no problem with the sink-url directly connecting to TiDB 4000, but when connecting to the external port of the load balancer, this error occurred. It is possible that there is a bug in the vendor’s load balancer. I am currently submitting a ticket for this issue.
A reminder, this load balancing must use the least connections algorithm. Also, is this the LB from Mobile Cloud?
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.