Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: dumpling导出的备份文件过大
【TiDB Version】v5.2.4
Dumpling Version: v5.0.3
【Reproduction Path】
Backup Command:
/usr/bin/dumpling -h 10.220.62.24 -P 4000 -u zzzz -p xxxxx -t 16 -F 256MB -o /data/tidb_bak/tidb_10.220.62.24/ -L /data/tidb_bak/dumpling_10.220.62.24_2023013116.log -r 100000
【Encountered Problem: Phenomenon and Impact】
Another normal cluster (same version) has normal backup conditions, with TiKV usage space around 10T, and the backup is approximately 5T.
The problematic backup target cluster’s total TiKV usage space is less than 2T, but the backup files can reach over 8T+ (before compression). This cluster has had a significant amount of data written recently, and the tables occupying the most backup space do not contain large fields.
Off-topic, personal suggestion: It is best to use a unified version for a cluster. If not unified, it is also recommended that the tool version be greater than the TiDB version itself.
Different versions may have different compression ratios. Dumpling is a logical backup, and it exports SQL statements, so the data volume it reflects is relatively accurate.
Okay, thank you. I couldn’t find a unified version here from the official source, so I replaced the tool with the latest version. I’ll check the results tomorrow.
Another cluster (same version) is used for backup, and the backup situation is normal. The TiKV usage space is about 10T, and the backup is roughly 5T.
(This information has been updated in the issue)
It doesn’t seem to be related to the compression ratio. I’ll try replacing the version first and see.
Is the space used by TiKV for 3 replicas of the data? For backups, it should be a single replica.
You can back up table by table, completing one and then cleaning up one.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.