TiCDC synchronizes messages to Kafka with old value enabled. What mode should be used to ensure global time order of messages across multiple partitions? Looking forward to expert advice!

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiCDC同步消息到kafka,开启了oldvalue。用什么模式如何保证多partition的消息全局时间有序。感谢大佬给支招~

| username: TiDBer_aKu9dgpb

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Encountered Problem: Problem Phenomenon and Impact] TiCDC synchronizes messages to Kafka, with old value enabled. What mode should be used to ensure global time order of messages across multiple partitions?

Thank you for sharing your practical experience~

| username: dba-kit | Original post link

Could you please elaborate on your requirements? Based on Kafka’s implementation mechanism, regardless of the mode, it can only ensure the order of messages within a partition and cannot guarantee time order across partitions.

| username: dba-kit | Original post link

However, after enabling the enable-tidb-extension parameter, TiCDC will construct a WATERMARK message every 1 second. This message can be used to indirectly coordinate the message sequence of multiple partitions.

| username: redgame | Original post link

Try this: To ensure the orderliness of the timestamp of messages in a Kafka partition, you can set the Kafka producer’s acks parameter to “all”.

| username: ljluestc | Original post link

In TiCDC, when synchronizing messages to Kafka and enabling the oldvalue option, you can use the legacy mode to ensure the global time order of messages across multiple partitions.

TiCDC uses incremental mode by default, which can provide better performance and lower latency. However, it does not guarantee global time order across multiple partitions.

In scenarios where global time order of messages across multiple partitions is required, such as when using the oldvalue option, you can switch to legacy mode. Legacy mode ensures that the order of messages is strictly determined by their commit timestamps, regardless of which partition they belong to.

To configure TiCDC to use legacy mode, you need to set the enable-old-value option to true in the TiCDC configuration file (cdc.toml). By enabling the enable-old-value option, TiCDC uses legacy mode to ensure the global time order of messages across multiple partitions, ensuring consistency when consuming data from Kafka.

It should be noted that the performance overhead of legacy mode may be slightly higher compared to the default incremental mode, so you should weigh this based on your specific requirements and workload characteristics.

| username: TiDBer_aKu9dgpb | Original post link

Our downstream Kafka is EMR, and now we want to improve the overall data flow rate through multi-partition consumption. The business has low tolerance for data errors, so I would like to ask if there is a reliable and ordered multi-partition solution.

| username: TiDBer_aKu9dgpb | Original post link

Let’s test this method later.

| username: TiDBer_aKu9dgpb | Original post link

Our downstream Kafka is EMR, and now we want to improve the overall data flow rate through multi-partition consumption. The business has low tolerance for data errors, so I would like to ask if there is a reliable multi-partition ordered solution.

| username: TiDBer_aKu9dgpb | Original post link

Hmm, can this ensure the ordering of messages across multiple partitions?

| username: TiDBer_aKu9dgpb | Original post link

After going around in circles, the previous solution is feasible. tiflow/docs/design/2020-02-24-ticdc-mq-protocol-cn.md at master · pingcap/tiflow · GitHub

| username: TiDBer_aKu9dgpb | Original post link

Thank you, everyone.