Migration Issues Between TiDB 4.0.2 and 4.0.9 Versions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB4.0.2和4.0.9版本迁移问题

| username: 舞动梦灵

Currently, there are two environments, 4.0.2 and 4.0.9, and we are preparing to migrate from Alibaba Cloud to another cloud environment. According to the official documentation, when using BR for backup and restore, using BR 5.0 for restoration is fully compatible, while using the original version 4.0 for restoration has a small bug.

I have three questions:

  1. If we use BR 4.0 to back up a TiDB 4.0 version and use BR 5.0 to restore it to a 5.0.0 version cluster, will there be any other issues during the migration process?
  2. Can TiDB versions 4.0.2 and 4.0.9 be migrated using the scale-out and scale-in method, and are there any issues with using this method?
  3. Can TiDB’s DM tool migrate TiDB to another cluster?
| username: 大飞哥online | Original post link

For cross-cloud migration, pay attention to the network speed between them. Cross-cloud transmission is not recommended. If downtime is acceptable, you can do a full backup of the cluster data and then restore it in the new environment.

| username: 舞动梦灵 | Original post link

The business cannot stop, and even a 10-minute downtime requires an application. I saw a friend mention that their company used the scaling method to migrate with version 5.0. Version 4.0 is too old, so I’m not sure if it will work. I’m asking to see if anyone has successfully migrated using the scaling method with version 4.0 and whether it might cause any other issues.

| username: 大飞哥online | Original post link

  1. Using BR 4.0 to back up TiDB 4.0 and then using BR 5.0 to restore to a 5.0.0 version cluster, will there be any issues during the migration process?
    Answer: There won’t be any issues with the same version. BR can perform a full backup and restore directly without problems, but you need to shut down the system.

  2. Can TiDB 4.0.2 and 4.0.9 versions be migrated using the scale-out and scale-in method? Are there any issues with this method?
    Answer: This method is slow and will affect the performance of the current business. It hasn’t been done with lower versions, so testing and verification are needed.

  3. Can TiDB’s DM tool migrate TiDB to another cluster?
    Answer: Yes, it can.

| username: 大飞哥online | Original post link

You can fully back up to the new cluster, then set up a CDC-like synchronization to synchronize new data in real-time. When you need to switch, just change the application’s IP direction.

| username: 像风一样的男子 | Original post link

If the business cannot be stopped and you want to migrate through scaling, you need to ensure that the networks of the two cloud environments are connected with very low latency and sufficient bandwidth. Otherwise, you can only apply for downtime and migrate data through BR. Choose one of the two options and prepare a plan for leadership approval.

| username: 大飞哥online | Original post link

Yes, the plan is well-written, the risk points are well-addressed, and the rest is up to the leadership’s decision. :grin:

| username: 舞动梦灵 | Original post link

What you mentioned, I also know, is to use BR for full backup recovery, then CDC for real-time synchronization, and finally switch the IP. You said, “Answer: There won’t be any issues with the same version. BR can directly perform a full backup and recovery without any problems, but it requires downtime.” The downtime is required at the recovery target, right? The BR backup source doesn’t need to be stopped, right?

For the third DM, if I directly deploy a DM server on a certain cloud and then synchronize from the source to the target, can it perform a full synchronization and then automatically switch to real-time synchronization?

This seems more labor-saving, but I see in the official DM documentation that in the data source configuration file, relay-binlog-name: “” # The starting file name of the upstream binlog to pull. If not specified, it will start synchronizing from the latest binlog. However, when creating a task, there is a task file with task-mode: “all,” which indicates both full and incremental synchronization. I’m a bit confused. If the binlog of the specified data source is not configured, and the task is created using the all mode, does it first perform a full migration and then synchronize, or does it synchronize the data from the latest binlog time?

Thank you.

| username: 舞动梦灵 | Original post link

Does BR need to be shut down?? I haven’t looked into the specific operations of BR yet. I’m thinking of checking which solution is more hassle-free and then looking at the documentation to do a test run. I saw that in the migration plan, BR backup mentions backing up to object storage, with a timestamp during the backup, and then using CDC to start real-time synchronization from that timestamp. I didn’t see any steps requiring a shutdown.

| username: 像风一样的男子 | Original post link

BR is a hot standby that can be used without shutting down, but to prevent any data from being written at the source, it’s best to shut down directly. Once the data migration is complete, you can switch over directly. Since you’re moving the entire data center, it’s definitely more than just data migration. You can coordinate with other applications to migrate together.

| username: 舞动梦灵 | Original post link

Yes, for BR, it can only be BR hot backup + CDC real-time synchronization, and then find the right time to migrate the application.

| username: 舞动梦灵 | Original post link

Big Brother Fei, can I ask if the DM synchronization tool can migrate a TiDB cluster? Can TiCDC be directly used as a migration tool for TiDB to TiDB clusters? I see the description only mentions pulling TiKV change logs for synchronization.

| username: 大飞哥online | Original post link

BR is a hot backup and does not require downtime. The downtime is to prevent new data from being generated. The old data center is shut down for a full BR backup, then restored to the new data center. The application can directly point to the new data center’s IP or domain name. During the backup and restore process, new data cannot be entered into the database. This needs to be evaluated.

DM can be either full or incremental, depending on the configuration.

| username: 大飞哥online | Original post link

CDC can only handle incremental data and needs to be used in conjunction with full data.

| username: 舞动梦灵 | Original post link

Brother Da Fei, I have a question. With the same environment and configuration, for 1TB of data, which method would be relatively faster, using scale-in/scale-out or DM? I saw in the official documentation that DM has a speed of 30-50GB per hour. Which of these two methods might be faster? What is the approximate rebalancing speed for scale-in/scale-out? Is this described in the official documentation?

| username: 像风一样的男子 | Original post link

DM supports MySQL as the upstream database, but does not support TiDB.

| username: 大飞哥online | Original post link

DM supports full data migration and incremental data synchronization from databases compatible with the MySQL protocol (MySQL, MariaDB, Aurora MySQL) to TiDB. MySQL 8.0 is still experimental.

It is better to use BR for full backup and TiCDC for incremental synchronization. When considering scaling in and out, you need to take into account the network between the two cloud environments. The time required for scaling in and out depends on various factors such as cluster resources, network, data volume, etc. It’s hard to estimate.

| username: Kongdom | Original post link

If there is no downtime, it seems like it’s just scaling up or down.

| username: 舞动梦灵 | Original post link

The official DM description states that the supported upstream databases are mentioned, but it doesn’t specify whether the source end supports TiDB. I am also concerned that it only supports MySQL and MariaDB databases because they have binlogs from 0001 to the latest, but TiDB doesn’t seem to have binlog growth.

| username: 大飞哥online | Original post link

Yes, scaling up and down takes a long time and may involve different cloud environments during business use, so you need to pay more attention to things like the leader. BR+CDC doesn’t require downtime and takes a shorter time, but all hardware resources need to be in place.