How to Balance cdc Changefeeds Tasks Across 4 Instances

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: cdc changefeeds 任务如何均衡分发到4个实例

| username: wluckdog

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】v6.1.0
【Reproduction Path】

  1. The original ticdc task cdc instances are xxx.xxx1:8300, xxx.xxx2:8300, with 22 changefeeds tasks.
  2. Added new cdc instances xxx.xxx1:8310, xxx.xxx2:8310, but the changefeeds tables were not allocated to the instances. After restarting cdc, they will be distributed to the two instances on the new cdc-8310 port.
  3. How to distribute changefeeds tasks to 4 cdc instances.

【Resource Configuration】
【Attachments: Screenshots / Logs / Monitoring】

1668655927609

| username: Meditator | Original post link

  1. Currently, CDC tasks cannot be automatically balanced and scheduled. If you need to balance them, remember to handle it manually.
  2. The distribution of CDC tasks is not based on the number of tasks, but rather routes all tables at the underlying level.

You can check this out:

| username: wluckdog | Original post link

The underlying layer routes according to the table. Is it fixed once a new task is created?
Why is the CDC instance on port 8300 not assigned any tasks?

| username: Meditator | Original post link

It is fixed, unless the capture (cdc-server) goes down, then it will be balanced to the remaining cdc-servers on a table-by-table basis.

| username: wluckdog | Original post link

How does the number of CDC process instances affect extraction? For example, what is the difference between having 2 CDC instances and 8 instances synchronizing CDC tasks?

| username: asddongmen | Original post link

The tables to be synchronized by changefeed will be allocated to different captures for synchronization based on the number of tables.

For the issue you encountered, please execute the ./cdc version command to get the CDC version information and paste it here to help us troubleshoot.
In theory, CDC should automatically load balance the tables. If it does not automatically load balance:

  1. You can use the openAPI to manually trigger the scheduling. Refer to: TiCDC OpenAPI v1 | PingCAP 文档中心
  2. If the above method does not work, consider pausing and restarting the changefeed.
  3. If neither of the above methods works, finally consider restarting the CDC owner node to refresh all states.
| username: wluckdog | Original post link

I have already restarted the CDC task, but the distribution is still uneven.
image
I have also adjusted the CDC task through commands, but it becomes uneven again after restarting.

| username: neilshen | Original post link

The issue of uneven scheduling has been optimized in version v6.2.0, which can automatically balance the number of tables on each TiCDC node. In previous versions, it was necessary to use the API to manually trigger balanced scheduling. For the method, refer to asddongmen’s answer.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.