How partitions are chosen in TiCDC -> Kafka

Jack_Wang · September 29, 2023, 2:42pm

I am running TiCDC into a kafka sink with 6 partitions. I notice that each table only ever goes to one partition. E.g. changes to table foo are always written to partition 3 and table bar is always written to partition 4, etc.

How guaranteed is this behaviour? Does the mapping of table → partition ever change?

I have built a custom tidb → snowflake tailer using kafka and before I relied on a central router to route each table to its own kafka topic. However the kafka producer has been dropping messages (which I cannot figure out why).

I want to just directly listen to the TiCDC sink topic but I want to know whether or not tables always map deterministically to partitions.

TiDB-AI-Assistant · September 29, 2023, 2:42pm

In TiCDC, the mapping of tables to partitions in the Kafka sink is deterministic and guaranteed. Each table will always be written to the same partition in the Kafka topic. This ensures that the updated data of a row is sent to the same partition in order.

Therefore, you can directly listen to the TiCDC sink topic and rely on the deterministic mapping of tables to partitions. This eliminates the need for a central router to route each table to its own Kafka topic. By directly listening to the TiCDC sink topic, you can consume the messages and process them accordingly in your custom TiDB → Snowflake tailer.

If you are experiencing dropped messages in your Kafka producer, it is recommended to investigate the root cause of the issue. It could be related to the configuration or setup of your Kafka cluster. You can refer to the Kafka documentation or seek assistance from the Kafka community to troubleshoot and resolve the issue.

Overall, you can rely on the deterministic mapping of tables to partitions in TiCDC’s Kafka sink for your custom tailer, but it is important to ensure the stability and reliability of your Kafka setup to avoid any message loss.

WalterWj · October 7, 2023, 1:54am

In TiCDC, when writing data to Kafka, there is a relationship between tables and partitions. Here is the relationship between tables and partitions when writing to Kafka with TiCDC:

By default, TiCDC writes the data of each table to a separate Kafka topic. This means that each table will have a corresponding Kafka topic.
For each Kafka topic of a table, TiCDC maps the data changes (such as insert, update, delete) to different Kafka partitions. Each Kafka partition is an ordered data queue.
TiCDC uses a default partitioning strategy to determine which Kafka partition to write the data to. This partitioning strategy is based on the table’s primary key. Specifically, TiCDC calculates a hash value based on the primary key value of the data and takes the modulus of the hash value with the number of Kafka partitions to determine the final partition number.For example, if a table has 3 Kafka partitions, TiCDC will calculate the hash value based on the primary key value of the data and take the modulus with 3. The result will be the partition number where the data should be written.This partitioning strategy ensures that data with the same primary key value is written to the same Kafka partition, ensuring data order.

It’s worth noting that TiCDC also supports custom partitioning strategies. You can specify a custom partitioning strategy in the TiCDC configuration file to meet specific requirements.