Real-time Data Synchronization Across Schemas

translator_bot · June 21, 2024, 1:41am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 跨schema实时同步数据

| username: 像风一样的男子

[TiDB Usage Environment] Production Environment / Testing / PoC
There is a requirement to synchronize table B in schema A to table D in schema C in real-time within the same database. The table structures of B and D are identical. What is the simplest way to achieve this?

translator_bot · June 21, 2024, 1:41am

| username: 小龙虾爱大龙虾 | Original post link

Isn’t it enough to use one set of data? Why do you need to copy another one?

translator_bot · June 21, 2024, 1:41am

| username: Kongdom | Original post link

As a developer, the general approach is as follows:

When writing to A in the program, simultaneously write to C.
Use ETL tools for near real-time synchronization.

Actually, the simplest method is 1

translator_bot · June 21, 2024, 1:41am

| username: Miracle | Original post link

If the account corresponding to schema C has permissions for table B in schema A, then it might not be necessary to store two copies.

translator_bot · June 21, 2024, 1:41am

| username: 像风一样的男子 | Original post link

I didn’t understand the development requirements. After talking for a long time, it still didn’t make sense.

translator_bot · June 21, 2024, 1:41am

| username: 像风一样的男子 | Original post link

The data source for this table is a bit scattered, making it difficult to synchronize and write. I want to use CDC to synchronize to Kafka, then have Java consume and write into table D. It’s a bit convoluted and troublesome.

translator_bot · June 21, 2024, 1:41am

| username: 像风一样的男子 | Original post link

The development team wants to store two copies, and the structure of table D might be different in the future.

translator_bot · June 21, 2024, 1:41am

| username: xfworld | Original post link

The solution you mentioned is feasible,
cdc → kafka → java → N tables
or CDC → kafka → Flink → N tables

The structure is different, but it’s not a problem as long as the data can be populated.

translator_bot · June 21, 2024, 1:41am

| username: 像风一样的男子 | Original post link

Can CDC create a task configuration with --sink-uri=“mysql://root:xxxx@0.0.0.0:4000/schema C” to synchronize data back to the same database cluster but to a different schema?

translator_bot · June 21, 2024, 1:41am

| username: 江湖故人 | Original post link

Using Kettle or replace into scripts for synchronization is very convenient, but if the source table frequently changes status, the cost of comparing and synchronizing historical changes can be quite high, and the higher the real-time requirements, the more resources it consumes. CDC is relatively better in terms of real-time performance, but it is somewhat heavy. The lightest way is to create a view, which consumes the least resources, as long as proper permission isolation is done.

translator_bot · June 21, 2024, 1:41am

| username: TiDBer_CQ | Original post link

If it’s possible to ensure that two copies of the data are synchronized during the write process, that would be best. If not, then use a data synchronization tool to achieve it.

translator_bot · June 21, 2024, 1:41am

| username: zhanggame1 | Original post link

Do you need real-time synchronization?

translator_bot · June 21, 2024, 1:41am

| username: Kongdom | Original post link

Isn’t this something that can be solved with a few lines of code at the application layer?
We usually develop a common write method for this table, and all writes need to call this method, then perform synchronous writes in this common method.
If, as you said, there are multiple sources and the sources are uncontrollable, the application layer is not easy to solve, especially when it involves third-party writes.

translator_bot · June 21, 2024, 1:41am

| username: 江湖故人 | Original post link

Developers transitioning to DBAs definitely have an advantage

I remember working on a project involving oracle->ogg->kafka->es, which had checksum and retransmission mechanisms and was quite complex to implement.
Later, a new development leader came in and changed it to write to both oracle and es simultaneously.