[TiDB Usage Environment] Production Environment
[TiDB Version]
[Encountered Problem]
Background: A large amount of underlying data is in domestic Pgsql and needs to be synchronized overseas.
Current Solution: Domestic Pgsql synchronizes to overseas Pgsql, and then synchronizes to overseas TiDB through DataX.
Problem: The current solution only addresses the link from domestic to overseas, and there is a subsequent need to synchronize overseas TiDB data back to domestic.
I would like to discuss with everyone whether using TiDB’s multi-center approach to ensure data synchronization between domestic and overseas, while using TiCDC to synchronize data to MQ, is feasible for synchronizing data between heterogeneous databases or businesses domestically and overseas.
Or should we consider using Debezium, where domestic and overseas synchronization is done through CDC, and then databases and business parties access the CDC solution?
Whether it’s cross-country or cross-data center, the most troublesome issues should be network and latency. Then, regardless of the CDC mechanism, the following need to be considered:
Data consistency issues when synchronization fails
Whether the business logic can support data validation
In a multi-center setup, whether the data contains center markers to reduce merging pressure
Due to network and latency issues, it feels like relying on the database for cross-border synchronization is more stable than relying on the CDC mechanism. By using a multi-center approach and distinguishing between the two data write sources overseas and domestically through Placement Rules, it ensures that at least one node has the latest data.
Is this asynchronous replication reliability referring to TiDB?
The reliability of CDC doesn’t seem very high either, unless you get an acknowledgment each time, but generally speaking, wouldn’t the concurrency be quite low in this case (due to network latency)?