Why Doesn't TiDB's BR Use Physical Backup?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB的BR为什么不用物理备份?

| username: TiDBer_VDhlK8wi

TiDB uses RocksDB, and RocksDB has a physical backup tool called myrockshotbackup. Why doesn’t TiDB adopt a physical backup solution?

After looking into the basic principles of the current BR backup, I found that it is still a logical backup, and during recovery, it also needs to rewrite keys. This seems to perform poorly in the case of large data volumes. From the perspective of traditional databases like Oracle and MySQL, logical backup performance is much worse than physical backup performance by several orders of magnitude.

So, I would like to ask why TiDB’s BR does not use a physical backup solution?

| username: jiyf | Original post link

  1. BR is a physical backup, while logical backup refers to exporting data in the form of databases and tables through the TiDB server, such as using the Dumpling tool.
  2. BR scans the key-value data of the region leader on the current instance through TiKV and directly generates RocksDB SST files.
  3. During BR backup, to obtain a consistent backup, there will be a [startTs, endTs) range. Based on TiDB’s MVCC, it means that versions that are scanned but not within the range will be filtered out.
  4. The rewrite key during restoration is because the key contains tableId and indexId information. On the restored table, tableId and indexId may change, so the rewrite key replaces it with a new prefix to perform the restoration physically.
  5. BR only backs up the region data that is the leader on the current TiKV. Since TiKV has multiple replicas, it only needs to back up the data of one replica.
  6. Is myrockshotbackup limited to the RocksDB engine for MySQL in MyRocks? It should not be applicable in the TiDB system; additionally, as mentioned in point 5, it is not necessary to back up all replica data from each TiKV.
  7. BR is a distributed backup, so all TiKV nodes can perform the backup simultaneously, making the backup speed acceptable.
| username: TiDBer_VDhlK8wi | Original post link

I saw the design principles of BR in the documentation of TiDB 6.2 source code. The basic implementation principle of BR is still select *, the difference from mydumper is just the distributed execution. What I mean by immediate physical backup is copying the physical files of rocksdb, rather than using tikv to select all the data to form sst.

| username: jiyf | Original post link

The backup principle you posted is exactly what I meant above. Refer to points 2, 3, 4, and 5.

  1. Refer to point 5 above, you only need to back up the data of one replica. If each tikv performs a physical backup, then multiple replicas of data are backed up.
  2. In the TiDB system, there may not be a suitable tool as you mentioned, or it has not been developed yet. The tool you mentioned is based on the myrocks engine of the MySQL system, which is different from tikv using rocksdb.
| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.