Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiDB集群有必要做数据备份吗,有什么推荐的备份方式呢
1: TiDB clusters by default have three replicas for each piece of data. When there are three or more TiKV nodes, each piece of data already has two backups on other nodes. Is it still necessary to back up all the data in the cluster?
2: If backup is needed, what methods are recommended? The backup solutions already used in the production environment are:
Official methods include:
- Backup using the br tool
- Backup using the Dumpling tool
- Set up dual-cluster master-slave replication?
In a production environment, backups are essential. Three replicas can only ensure that data is not lost if one machine goes down, but it cannot restore previous data. For large volumes of data, use BR; for small volumes, you can use Dumpling. Dual-cluster replication is more costly and is generally used for disaster recovery.
How often do you usually back up?
It depends on individual needs. Generally, if the cluster is not particularly large, a backup once a day should suffice.
Are you doing incremental backups or full backups? How large is the data when you say it’s not big? Is a weekly backup reliable?
I am doing a full backup. The size of the data specifically depends on the time it takes for you to complete a full backup.
Backups are essential. The frequency of backups can be tailored to the actual situation and adjusted accordingly.
Do you back up locally or to a cloud storage service like S3? Are you using BR or Dumpling? I’m worried that uploading large amounts of data directly to S3 might get interrupted.
Backups are definitely necessary. They can be saved to a shared disk, so in case multiple machines go down, data can still be quickly restored through scaling in or out nodes.
Makes sense, but if multiple nodes fail, for example, if two out of three nodes fail, how do you recover the data? Do you directly use the backup data to rebuild the cluster?
It is very necessary to make backups, and it is even better to use them with binlog. For example, if you want to restore to 6 PM two days ago, you can perform a point-in-time recovery.
For data larger than 1TB, it is recommended to use BR backup.
For data smaller than 1TB, you can use dumpling backup.
If possible, back up to object storage like S3.
If it’s a database, you need to make backups. No one knows when the machine will crash. If something goes wrong and you can’t recover it, you’re screwed.
The backup frequency depends on the user’s tolerance level. If you can’t afford to lose any data, then do a full backup every day and incremental backups in real-time. This way, if something goes wrong, there’s still hope for recovery.
There are still many scenarios requiring backup and recovery; production databases need to be backed up.
Database backup is the last line of defense, it must be done.
Backups are essential. Depending on the size of your data, you can perform full and incremental backups based on your acceptable backup time.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.