Is it necessary to back up data in a TiDB cluster, and what are the recommended backup methods?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB集群有必要做数据备份吗,有什么推荐的备份方式呢

| username: TiDBer_oqrCNpbV

1: TiDB clusters by default have three replicas for each piece of data. When there are three or more TiKV nodes, each piece of data already has two backups on other nodes. Is it still necessary to back up all the data in the cluster?
2: If backup is needed, what methods are recommended? The backup solutions already used in the production environment are:
Official methods include:

  • Backup using the br tool
  • Backup using the Dumpling tool
  • Set up dual-cluster master-slave replication?
| username: 啦啦啦啦啦 | Original post link

In a production environment, backups are essential. Three replicas can only ensure that data is not lost if one machine goes down, but it cannot restore previous data. For large volumes of data, use BR; for small volumes, you can use Dumpling. Dual-cluster replication is more costly and is generally used for disaster recovery.

| username: TiDBer_oqrCNpbV | Original post link

How often do you usually back up?

| username: 啦啦啦啦啦 | Original post link

It depends on individual needs. Generally, if the cluster is not particularly large, a backup once a day should suffice.

| username: TiDBer_oqrCNpbV | Original post link

Are you doing incremental backups or full backups? How large is the data when you say it’s not big? Is a weekly backup reliable?

| username: 啦啦啦啦啦 | Original post link

I am doing a full backup. The size of the data specifically depends on the time it takes for you to complete a full backup.

| username: wisdom | Original post link

Backups are essential. The frequency of backups can be tailored to the actual situation and adjusted accordingly.

| username: TiDBer_oqrCNpbV | Original post link

Do you back up locally or to a cloud storage service like S3? Are you using BR or Dumpling? I’m worried that uploading large amounts of data directly to S3 might get interrupted.

| username: gary | Original post link

Backups are definitely necessary. They can be saved to a shared disk, so in case multiple machines go down, data can still be quickly restored through scaling in or out nodes.

| username: TiDBer_oqrCNpbV | Original post link

Makes sense, but if multiple nodes fail, for example, if two out of three nodes fail, how do you recover the data? Do you directly use the backup data to rebuild the cluster?

| username: buddyyuan | Original post link

It is very necessary to make backups, and it is even better to use them with binlog. For example, if you want to restore to 6 PM two days ago, you can perform a point-in-time recovery.

For data larger than 1TB, it is recommended to use BR backup.
For data smaller than 1TB, you can use dumpling backup.

If possible, back up to object storage like S3.

| username: 张雨齐0720 | Original post link

If it’s a database, you need to make backups. No one knows when the machine will crash. If something goes wrong and you can’t recover it, you’re screwed.
The backup frequency depends on the user’s tolerance level. If you can’t afford to lose any data, then do a full backup every day and incremental backups in real-time. This way, if something goes wrong, there’s still hope for recovery.

| username: alfred | Original post link

There are still many scenarios requiring backup and recovery; production databases need to be backed up.

| username: HACK | Original post link

Database backup is the last line of defense, it must be done.

| username: Ming | Original post link

Backups are essential. Depending on the size of your data, you can perform full and incremental backups based on your acceptable backup time.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.