【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】v5.4.2
【Encountered Problem】Using one cluster for backup and restore to initialize data, and synchronizing incremental data through binlog. Currently, the synchronization status between the two clusters is unknown, and it is unclear whether the incremental data synchronization is complete.
【Reproduction Path】
Backup data on the old cluster;
Restore data on the new cluster;
Deploy drainer on the old cluster to synchronize incremental data to the new cluster.
【Problem Phenomenon and Impact】How to determine if the incremental data synchronization is complete.
【Attachment】
Below is the data from the new cluster’s tidb_binlog.checkpoint:
mysql> select * from checkpoint;
±--------------------±---------------------------------------------------------------------------------------------------------------------------------------------+
| clusterID | checkPoint |
±--------------------±---------------------------------------------------------------------------------------------------------------------------------------------+
| 7141199591139448285 | {“consistent”:true,“commitTS”:435955253777793025,“ts-map”:{“master-ts”:435899089729617923,“slave-ts”:435899515558690818},“schema-version”:0} |
±--------------------±---------------------------------------------------------------------------------------------------------------------------------------------+
Corresponding module’s Grafana monitoring (if any, such as BR, TiDB-binlog, TiCDC, etc.)
Corresponding module logs (including logs from 1 hour before and after the issue)
If the question is related to performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.
Thank you. This is a good method for querying the time difference between master and slave synchronization, but it has a problem. It can only show that the time is consistent, and it is accurate to the second. Even if the time is consistent, it cannot directly indicate that the data of the two clusters is consistent.
It would be better if it were something like master-ts=slave-ts, where there is a place showing which TSO the slave has synchronized to from the master. However, it is obvious that the TSOs of the two clusters are very far apart.
Use select timediff(now(),tidb_parse_tso(substr(checkpoint,31,18))) from tidb_binlog.checkpoint; to confirm that the incremental data of the replica has been mostly synchronized (if the database is large, it may take a long time for the primary database to back up and the replica to restore);
Stop writing to the primary database;
Use sync-diff-inspector to perform data consistency checks.
You can directly compare based on the value of the checkpoint. As long as this part of the data is consistent, you can compare it again after a period of time. There’s no need to stop writing to the primary database, right?
Hmm, let me ask:
What are the upstream and downstream doing respectively? Is the upstream currently handling business?
Is the downstream currently synchronizing without any business?
Do you want to switch the business from upstream to downstream?
Two data centers, the upstream has business operations, and the downstream is a newly built cluster. Backup recovery and incremental synchronization need to be performed.
Data verification takes a certain amount of time. Can you accept stopping business writes within this time frame? Shouldn’t you verify first, then stop writes and switch?
You can specify a snapshot, take a range for data validation, and then stop the business and switch the database after the validation is complete. This way, the downtime for the business will be particularly short. However, in this case, any data written after the data validation starts will not be known if it is consistent.