Real-time Data Synchronization Across Schemas

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 跨schema实时同步数据

| username: 像风一样的男子

[TiDB Usage Environment] Production Environment / Testing / PoC
There is a requirement to synchronize table B in schema A to table D in schema C in real-time within the same database. The table structures of B and D are identical. What is the simplest way to achieve this?

| username: 小龙虾爱大龙虾 | Original post link

Isn’t it enough to use one set of data? Why do you need to copy another one?

| username: Kongdom | Original post link

:yum: As a developer, the general approach is as follows:

  1. When writing to A in the program, simultaneously write to C.
  2. Use ETL tools for near real-time synchronization.

Actually, the simplest method is 1 :yum:

| username: Miracle | Original post link

If the account corresponding to schema C has permissions for table B in schema A, then it might not be necessary to store two copies.

| username: 像风一样的男子 | Original post link

I didn’t understand the development requirements. After talking for a long time, it still didn’t make sense.

| username: 像风一样的男子 | Original post link

The data source for this table is a bit scattered, making it difficult to synchronize and write. I want to use CDC to synchronize to Kafka, then have Java consume and write into table D. It’s a bit convoluted and troublesome.

| username: 像风一样的男子 | Original post link

The development team wants to store two copies, and the structure of table D might be different in the future.

| username: xfworld | Original post link

The solution you mentioned is feasible,
cdc → kafka → java → N tables
or CDC → kafka → Flink → N tables

The structure is different, but it’s not a problem as long as the data can be populated.

| username: 像风一样的男子 | Original post link

Can CDC create a task configuration with --sink-uri=“mysql://root:xxxx@0.0.0.0:4000/schema C” to synchronize data back to the same database cluster but to a different schema?

| username: 江湖故人 | Original post link

Using Kettle or replace into scripts for synchronization is very convenient, but if the source table frequently changes status, the cost of comparing and synchronizing historical changes can be quite high, and the higher the real-time requirements, the more resources it consumes. CDC is relatively better in terms of real-time performance, but it is somewhat heavy. The lightest way is to create a view, which consumes the least resources, as long as proper permission isolation is done.

| username: TiDBer_CQ | Original post link

If it’s possible to ensure that two copies of the data are synchronized during the write process, that would be best. If not, then use a data synchronization tool to achieve it.

| username: zhanggame1 | Original post link

Do you need real-time synchronization?

| username: Kongdom | Original post link

:flushed: Isn’t this something that can be solved with a few lines of code at the application layer?
We usually develop a common write method for this table, and all writes need to call this method, then perform synchronous writes in this common method.
If, as you said, there are multiple sources and the sources are uncontrollable, the application layer is not easy to solve, especially when it involves third-party writes.

| username: 江湖故人 | Original post link

Developers transitioning to DBAs definitely have an advantage :laughing:

I remember working on a project involving oracle->ogg->kafka->es, which had checksum and retransmission mechanisms and was quite complex to implement.
Later, a new development leader came in and changed it to write to both oracle and es simultaneously.

| username: zhaokede | Original post link

Asynchronously synchronize the data, the overhead will be smaller.

| username: zhanggame1 | Original post link

Writing to two databases simultaneously involves distributed transactions, which can be troublesome if one succeeds and the other fails.

| username: 像风一样的男子 | Original post link

It would be great if there was a feature like SQL Server’s mirrored tables.

| username: 像风一样的男子 | Original post link

Modifying all the involved code is a very troublesome task, especially for projects with some history.

| username: Kongdom | Original post link

We all use flexible transactions, where one might succeed and another might fail, but there are mechanisms to ensure eventual consistency. :yum:

| username: Kongdom | Original post link

:yum: If you have SQL Server, why use any images? Just use triggers to get it done.