Can TiDB use the Otter tool to extract and synchronize to another TiDB?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb能用otter工具来抽取同步到另外一个tidb吗?

| username: 舞动梦灵

I want to synchronize several large tables from a cloud TiDB to a local TiDB. Can I use the Otter synchronization tool for this operation? I see that Otter works by mimicking itself as a slave to fetch binlogs from MySQL. TiDB also has binlogs, but I’m not sure if it can be used.

| username: 啦啦啦啦啦 | Original post link

It should not be possible, the binlog formats are different. For direct synchronization between two TiDB instances, you can use TiCDC.

| username: 舞动梦灵 | Original post link

I want to synchronize some tables and preferably filter out delete operations. The best version for CDC is 4.0.6. My version is 4.0.2, which is too old. I’m afraid there might be issues.

| username: 啦啦啦啦啦 | Original post link

Using tidb binlog synchronization with the old version should also work.

| username: dba远航 | Original post link

Maintaining consistency with the TiDB version (MySQL version) is worth a try; it should work.

| username: WalterWj | Original post link

Based on binlog, it is not possible, but logically it can be.

| username: 舞动梦灵 | Original post link

Are there any tools or methods for logical synchronization? Tools for synchronizing and transferring data across servers?

| username: Kongdom | Original post link

How large is the table? I usually perform logical backups.

| username: 舞动梦灵 | Original post link

How to perform a logical backup? Using Dumpling and Lightning? Tables with more than a terabyte. There are more than a dozen of them. Planning to use BR to back up the entire database and import it locally first, then only synchronize tables larger than 200GB. For these tables, query and insert data older than 60 days into the local database on a daily basis by date. Delete historical data from the cloud tables.

| username: 舞动梦灵 | Original post link

The current idea is:

  1. Let the developers do it, extract the data and insert it locally, but it seems the developers are unwilling.
  2. We can only use dumpling to export specified tables and data for specified dates, then transfer it locally and schedule the insertion.
| username: Kongdom | Original post link

:sweat_smile: Up T? It’s almost the size of my entire database here. Better go with BR physical backup.

| username: 舞动梦灵 | Original post link

Migration using BR, what I’m thinking now is, after migrating with BR, regularly back up one day’s data from 60 days ago for a specific table, and then synchronize, synchronizing one day’s data each day.

| username: Kongdom | Original post link

BR full backup + TiCDC incremental backup should be considered the standard solution.

| username: swino | Original post link

Although they are both called binlog, the storage formats may not be the same. However, you can give it a try.

| username: 舞动梦灵 | Original post link

In version 4.0, CDC real-time synchronization cannot filter deletes. It seems that only version 6.5 and later can filter them. There is related documentation on this. I need to delete historical data from the source end every day, but the target end should not delete it and should always retain it.

| username: WalterWj | Original post link

TiDB to TiDB is already supported. For version 4, use TiDB Binlog with Pump + Drainer.

| username: Kongdom | Original post link

:thinking: I indeed did not notice this and do not have practical experience in this area.

| username: 舞动梦灵 | Original post link

I read the tidbbinlog documentation, and there are parameters to specify syncing certain tables, but I didn’t see any option to filter delete operations.

| username: forever | Original post link

This is a standard T+N. You can use DataX; there’s no need to use this real-time method, especially since there’s no delete operation.

| username: 舞动梦灵 | Original post link

Yes, exactly. I just want to set one up. If it can be real-time with delete filtering, that would be great. If not, then it doesn’t need to be real-time; periodic synchronization every day would be fine. Ideally, it should be automatic synchronization.