How to Choose a CDC Distributor When the Table Has a Primary Key

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: cdc分发器在表有主键的情况下,如何选择分发器

| username: wluckdog

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
[Encountered Problem: Phenomenon and Impact]

  1. When CDC is synchronizing a single table with a primary key, should ts or index-values be chosen? Which has better distribution effects, and what are the pros and cons?
  2. If the primary key is a relatively long varchar value, should ts or index-values be chosen?
  3. When the table has a primary key and updates are frequent with high concurrent modifications, should ts or index-values be chosen for faster, safer, and more stable synchronization?
    [Resource Configuration]
    [Attachments: Screenshots/Logs/Monitoring]
| username: xfworld | Original post link

This depends on the configuration of the downstream Kafka topic. If each table has its own topic, or if multiple tables share a topic, the behavior will be different. Additionally, whether there are multiple partitions or a single partition will affect the data order.

At this point, a choice needs to be made: whether to strictly ensure data order or to ensure greater throughput and processing performance. Whether it’s ts or index-value, the main consideration is this.

| username: wluckdog | Original post link

  1. If the upstream table is one and the downstream topic configuration is only one, it is a one-to-one relationship. If the table changes frequently, should ts or index-value be used?
    Configuration file:
    case-sensitive = true
    enable-old-value = true
    [filter]
    ignore-txn-start-ts = [1, 2]
    rules = [‘test1.tt’]
    [mounter]
    worker-num = 16
    [sink]
    dispatchers = [
    {matcher = [‘test1.tt’], topic = “Topic Expression 1”, partition = “ts” },
    ]
    protocol = “canal-json”

  2. How to understand “whether the partition is multiple or single will affect the data order”? Can you give an example?

  3. Under the premise that the table and topic are one-to-one, I can understand that partitioning by ts will ensure data order, while index-values have greater throughput and processing capacity.

| username: xfworld | Original post link

I suggest setting up an environment to test and see which one suits you better.

I’ve already explained the idea clearly, so you can choose on your own.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.