TiDB Database Continuity and Migration Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB数据库连续性及迁移问题

| username: TiDBer_xgwIsUrp

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issues: Issue Phenomenon and Impact]
I have a few questions:
Question 1: How does a distributed cluster ensure business continuity? According to the documentation, there are multiple instances (does this refer to multiple replicas?). For a distributed database like TiDB, is it necessary to back up any important files? If so, what tools are recommended for backup? If one instance fails among multiple instances, does it require manual intervention, or will it self-repair?

Question 2: Migration Issues
I see that the migration tools mentioned in the documentation refer to upstream and downstream. Does this mean these tools only support online migration? Must the upstream and downstream be interconnected for migration? Are there tools similar to RMAN or EXP/impdp, or mysqldump?

Question 3: I see that TiDB is compatible with the MySQL protocol. Should this be understood as MySQL tools like mysqldump and MySQL clients being applicable to TiDB as well? If these clients are installed directly, will the tools also be usable?

| username: TiDBer_21wZg5fm | Original post link

Regardless of the database, backups are essential, right?

| username: TIDB-Learner | Original post link

  1. The possibility of issues occurring in a distributed database is lower than in a centralized one, but that doesn’t mean disaster recovery isn’t needed.
  2. The migration tools are comprehensive. It is recommended to first check the official documentation.
  3. It is compatible with MySQL clients.
| username: YuchongXU | Original post link

Manual intervention is required.

| username: zhanggame1 | Original post link

Question 1: TiDB by default has 3 replicas, meaning the data is stored in 3 copies on different TiKV nodes. Backup is done using BR + logs, and it can also be synchronized in real-time to downstream using TiCDC. A single instance failure does not affect business operations; if possible, the database will recover itself, otherwise, manual intervention is required.

Question 2: Upstream and downstream refer to online data synchronization or data migration. To synchronize MySQL to TiDB, use the DM tool; to synchronize TiDB to downstream (MySQL, TiDB, Kafka, etc.), use the TiCDC tool. Similar to tools like RMAN or EXP, TiDB comes with BR for physical backups and Dumpling for logical backups.

Question 3: MySQL clients can be used directly, and MYSQLDUMP has been replaced by Dumpling. Using MYSQLDUMP directly with newer versions may have compatibility issues.

| username: wangkk2024 | Original post link

Business continuity should refer to all TiDB nodes being highly available, planning multiple TiDB servers, TiKV, etc.

| username: DBAER | Original post link

These documents include both physical backups and logical backups. MySQL client tools are generally compatible.

| username: dba远航 | Original post link

Take a good look at the training videos; most of your questions will be answered.

| username: zhang_2023 | Original post link

Backups are essential. What databases are you using upstream and downstream?

| username: Hacker_PtIIxHC1 | Original post link

  1. It is necessary to back up the relevant cluster topology files under the TiUP control machine (to prevent the control machine from crashing and being unable to handle it centrally from other machines), and the TiDB cluster data itself needs to be backed up.
  2. For upstream and downstream migration, there are DM\TICDC tools, and it is necessary to ensure that the tools and upstream and downstream can communicate.
  3. The MySQL client is monitored, but mysqldump backup is too slow. You can use dumpling or BR for physical backup.
| username: Hacker_PtIIxHC1 | Original post link

The MySQL client is compatible.

| username: RyanHowe | Original post link

For the migration in question two, generally speaking, it refers to the synchronization and migration of incremental data. The newly generated data in the upstream database will be synchronized to the downstream database in real-time. Therefore, it is necessary to ensure that both the upstream and downstream instances are online, and considering the network environment, it is essential to ensure that the network between the upstream and downstream instances is already connected.

Regarding question three, the SQL protocols of MySQL and TiDB are indeed interoperable. For example, an SQL statement that can run in MySQL can also run in TiDB. However, in terms of ecosystem tools, MySQL’s native mysqldump is not supported for use in TiDB. This is because the logical backup principle of mysqldump involves adding a global lock and then reading the data snapshot. In contrast, TiDB does not add a global lock but directly obtains the data snapshot at a specific timestamp by setting the tidb_snapshot system variable (this is related to TiDB’s MVCC, and you can learn about the MVCC principles of TiDB).

| username: 友利奈绪 | Original post link

  1. All databases involved in production require manual intervention for backups, even though they inherently have high availability.
  2. For upstream and downstream migration, there are tools like DM and TiCDC. You need to ensure that the tools and the upstream and downstream systems can communicate, or you can test other tools like cc or DataX.
  3. It only supports the MySQL protocol, so you can use JDBC or clients for connection and most SQL statements. Note that stored procedures are not supported, so you need to avoid them.
| username: virvle | Original post link

I haven’t seen it yet. In which video course does this appear?

| username: TiDBer_xgwIsUrp | Original post link

Yes, planning to test it.

| username: TiDBer_xgwIsUrp | Original post link

There is documentation available, but I rarely watch videos when learning a new database. I usually refer to the documentation.

| username: virvle | Original post link

Yes, the documentation will be more detailed and will also be based on one’s own progress.

| username: TiDBer_JUi6UvZm | Original post link

Question one, a distributed database ensures business continuity through high-availability deployment methods. As you mentioned, multiple replicas essentially mean high availability. Backup is a routine maintenance task that needs to be done regardless of whether the system is centralized or distributed. TiDB places great emphasis on maintenance efficiency, and you can refer to the relevant chapters for backup and recovery tools. Regarding automatic repair versus manual repair, most of the time it is automatic, but there are certain situations where manual intervention is required, so it is important to have good backups.

| username: TiDBer_JUi6UvZm | Original post link

Question three, although it is protocol-compatible, it is recommended to use the tools provided by the original manufacturer to reduce potential issues.