Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb怎么按时间点恢复?
The official documentation of BR provides examples of restoring a single table, but it doesn’t explain how to use binlog logs for point-in-time recovery. If I backed up the data the previous day, wouldn’t I lose a day’s worth of data during recovery? I want to know if there are any tools to consume TiDB’s binlog, or does TiDB not use binlog for recovery?
You can refer to this Reparo 使用文档 | PingCAP 文档中心 for incremental recovery of binlog.
Full backup and full restore: BR or Dumpling
Incremental backup: Drainer’s file mode output (tidb-binlog)
Incremental restore: Reparo
The timestamp for connecting full and incremental backups is recorded in the meta file in the backup directory of BR and Dumpling.
Is this the only way? Then I still need to install dumpling and drainer for this. According to the official recommendation, dumpling requires 3 nodes and drainer requires one node, which is still a bit resource-intensive!
Dumpling is a logical backup tool similar to mydumper. Are you saying that pump requires 3 nodes? Currently, that’s the case. In future versions, once BR implements the incremental file backup feature, the entire PITR functionality will be covered by BR alone.
I would like to ask further, my current goal is to generate binlog like MySQL for recovery purposes. But it seems that TiDB requires installing pump (to collect data) and drainer (to parse data). Both pump and drainer will record a binlog, and then reparo uses the data parsed by drainer for recovery, so I can’t just install pump. Drainer also needs to specify the downstream dest_type, but I don’t need to specify it, so I don’t need drainer either. How can this be resolved?
You can refer to this Reparo 使用文档 | PingCAP 文档中心 for incremental recovery of binlog.
You can extend the GC time and enable the flashback feature. There is an official tutorial. Recovering data is very simple and doesn’t require any tools.
Currently, incremental file backup uses drainer’s file output mode, which is db-type = file.
I’m worried that the entire cluster might go down and commands won’t be executable.
Binlog is the last resort.
Your concern is unnecessary. In a distributed system, there is no situation where a single point of failure occurs. Secondly, if there is a sudden power outage and all your machines go down, there won’t be any issues. Of course, the UPS in the data center needs to be installed.
In theory, you are right, but I always feel more secure using binlog. What if the flashback method gets GC’d, or it takes a long time to realize a table is missing? May I ask if your production environment uses flashback + BR (or Dumpling) method? Have you not enabled binlog?
In our business, there are many update scenarios. At the initial stage of the project launch, I set tidb_gc_life_time to 48 hours, not daring to set it for too long.
There will be many problems in 48 hours. I only dare to set it to 10 minutes.
It seems that point-in-time recovery is not supported yet.
What are the industry-standard backup methods for TiDB? Can anyone provide some guidance? Do we really not need to enable binlog? We should enable it, right?
Don’t worry about TiDB crashing, it’s impossible.
TiDB can be overwhelmed but it never has issues.
This topic was automatically closed 1 minute after the last reply. No new replies are allowed.