Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 跨schema实时同步数据
[TiDB Usage Environment] Production Environment / Testing / PoC
There is a requirement to synchronize table B in schema A to table D in schema C in real-time within the same database. The table structures of B and D are identical. What is the simplest way to achieve this?
Isn’t it enough to use one set of data? Why do you need to copy another one?
As a developer, the general approach is as follows:
- When writing to A in the program, simultaneously write to C.
- Use ETL tools for near real-time synchronization.
Actually, the simplest method is 1 
If the account corresponding to schema C has permissions for table B in schema A, then it might not be necessary to store two copies.
I didn’t understand the development requirements. After talking for a long time, it still didn’t make sense.
The data source for this table is a bit scattered, making it difficult to synchronize and write. I want to use CDC to synchronize to Kafka, then have Java consume and write into table D. It’s a bit convoluted and troublesome.
The development team wants to store two copies, and the structure of table D might be different in the future.
The solution you mentioned is feasible,
cdc → kafka → java → N tables
or CDC → kafka → Flink → N tables
The structure is different, but it’s not a problem as long as the data can be populated.
Can CDC create a task configuration with --sink-uri=“mysql://root:xxxx@0.0.0.0:4000/schema C” to synchronize data back to the same database cluster but to a different schema?
Using Kettle or replace into
scripts for synchronization is very convenient, but if the source table frequently changes status, the cost of comparing and synchronizing historical changes can be quite high, and the higher the real-time requirements, the more resources it consumes. CDC is relatively better in terms of real-time performance, but it is somewhat heavy. The lightest way is to create a view, which consumes the least resources, as long as proper permission isolation is done.
If it’s possible to ensure that two copies of the data are synchronized during the write process, that would be best. If not, then use a data synchronization tool to achieve it.
Do you need real-time synchronization?
Isn’t this something that can be solved with a few lines of code at the application layer?
We usually develop a common write method for this table, and all writes need to call this method, then perform synchronous writes in this common method.
If, as you said, there are multiple sources and the sources are uncontrollable, the application layer is not easy to solve, especially when it involves third-party writes.
Developers transitioning to DBAs definitely have an advantage 
I remember working on a project involving oracle->ogg->kafka->es, which had checksum and retransmission mechanisms and was quite complex to implement.
Later, a new development leader came in and changed it to write to both oracle and es simultaneously.
Asynchronously synchronize the data, the overhead will be smaller.
Writing to two databases simultaneously involves distributed transactions, which can be troublesome if one succeeds and the other fails.
It would be great if there was a feature like SQL Server’s mirrored tables.
Modifying all the involved code is a very troublesome task, especially for projects with some history.
We all use flexible transactions, where one might succeed and another might fail, but there are mechanisms to ensure eventual consistency. 
If you have SQL Server, why use any images? Just use triggers to get it done.