Different Data Volumes Synchronized to Kafka by Master-Slave Cluster Drainer

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 主从集群drainer同步到kafka的数据量不同

| username: wjf870128

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.1.2/5.1.4
[Environment Background] The old instance 5.1.2 synchronizes data to the new instance 5.1.4 through drainer. Both instances write to Kafka through drainer.
[Encountered Problem]
The data volume synchronized to Kafka by the drainer of the old instance is inconsistent with that of the new instance.
In Kafka monitoring, the data generated by the new instance is approximately 25% less.
In their respective monitoring, the number of events in the drainer is consistent.
Old instance

New instance

Could you please explain what might be causing this situation?

| username: xfworld | Original post link

Are the old and new versions coexisting?

| username: wjf870128 | Original post link

The old cluster is version 5.1.2, and the new cluster has been upgraded to version 5.1.4.

| username: xfworld | Original post link

Here is the translation:

It is recommended to judge from the following dimensions:

  1. Is there any data loss?
  2. Is there any abnormality in the downstream connections?

As for the data volume, it doesn’t seem to have judgment value, right?

| username: wjf870128 | Original post link

Today, by directly consuming Kafka, we found that the data is consistent. However, it is still unclear why the two clusters are producing different amounts of data.

| username: xfworld | Original post link

It doesn’t matter… :crazy_face: