Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIDB数据库连续性及迁移问题
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issues: Issue Phenomenon and Impact]
I have a few questions:
Question 1: How does a distributed cluster ensure business continuity? According to the documentation, there are multiple instances (does this refer to multiple replicas?). For a distributed database like TiDB, is it necessary to back up any important files? If so, what tools are recommended for backup? If one instance fails among multiple instances, does it require manual intervention, or will it self-repair?
Question 2: Migration Issues
I see that the migration tools mentioned in the documentation refer to upstream and downstream. Does this mean these tools only support online migration? Must the upstream and downstream be interconnected for migration? Are there tools similar to RMAN or EXP/impdp, or mysqldump?
Question 3: I see that TiDB is compatible with the MySQL protocol. Should this be understood as MySQL tools like mysqldump and MySQL clients being applicable to TiDB as well? If these clients are installed directly, will the tools also be usable?
Regardless of the database, backups are essential, right?
Manual intervention is required.
Question 1: TiDB by default has 3 replicas, meaning the data is stored in 3 copies on different TiKV nodes. Backup is done using BR + logs, and it can also be synchronized in real-time to downstream using TiCDC. A single instance failure does not affect business operations; if possible, the database will recover itself, otherwise, manual intervention is required.
Question 2: Upstream and downstream refer to online data synchronization or data migration. To synchronize MySQL to TiDB, use the DM tool; to synchronize TiDB to downstream (MySQL, TiDB, Kafka, etc.), use the TiCDC tool. Similar to tools like RMAN or EXP, TiDB comes with BR for physical backups and Dumpling for logical backups.
Question 3: MySQL clients can be used directly, and MYSQLDUMP has been replaced by Dumpling. Using MYSQLDUMP directly with newer versions may have compatibility issues.
Business continuity should refer to all TiDB nodes being highly available, planning multiple TiDB servers, TiKV, etc.
These documents include both physical backups and logical backups. MySQL client tools are generally compatible.
Take a good look at the training videos; most of your questions will be answered.
Backups are essential. What databases are you using upstream and downstream?
The MySQL client is compatible.
For the migration in question two, generally speaking, it refers to the synchronization and migration of incremental data. The newly generated data in the upstream database will be synchronized to the downstream database in real-time. Therefore, it is necessary to ensure that both the upstream and downstream instances are online, and considering the network environment, it is essential to ensure that the network between the upstream and downstream instances is already connected.
Regarding question three, the SQL protocols of MySQL and TiDB are indeed interoperable. For example, an SQL statement that can run in MySQL can also run in TiDB. However, in terms of ecosystem tools, MySQL’s native mysqldump is not supported for use in TiDB. This is because the logical backup principle of mysqldump involves adding a global lock and then reading the data snapshot. In contrast, TiDB does not add a global lock but directly obtains the data snapshot at a specific timestamp by setting the tidb_snapshot system variable (this is related to TiDB’s MVCC, and you can learn about the MVCC principles of TiDB).
I haven’t seen it yet. In which video course does this appear?
Yes, planning to test it.
There is documentation available, but I rarely watch videos when learning a new database. I usually refer to the documentation.
Yes, the documentation will be more detailed and will also be based on one’s own progress.
Question one, a distributed database ensures business continuity through high-availability deployment methods. As you mentioned, multiple replicas essentially mean high availability. Backup is a routine maintenance task that needs to be done regardless of whether the system is centralized or distributed. TiDB places great emphasis on maintenance efficiency, and you can refer to the relevant chapters for backup and recovery tools. Regarding automatic repair versus manual repair, most of the time it is automatic, but there are certain situations where manual intervention is required, so it is important to have good backups.
Question three, although it is protocol-compatible, it is recommended to use the tools provided by the original manufacturer to reduce potential issues.