Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: br 备份一直卡在checksum很长时间
【TiDB Usage Environment】Production
【TiDB Version】v5.2.0
【Encountered Problem】BR full backup keeps on checksum
【Reproduction Path】None
【Problem Phenomenon and Impact】
【Attachment】
This image shows it is stuck on checksum
This image shows the backup log, no errors found, but the total data size is only 288GB, not that large. However, it has been stuck at 98.95% during checksum for an hour. Is this normal? If I stop the backup process now, can I restore the data on the target end?
After waiting for about 6 hours, it finally completed.
, the checksum took longer than the backup itself. What could be the reason for this? The data size is not very large, only 290GB, yet the checksum took around 6 hours. It’s quite frustrating.
Check the resources, or see if there are other places calling it.
Based on experience, IO bottleneck, I had a similar situation before.
Take a look at the monitoring, we can only make blind guesses otherwise.
Does it take a long time to checksum every time you back up?
BR performs checksum calculations to ensure the integrity of backup data.
Based on the current information, the slow checksum calculation may be due to the following reasons:
- There is a large amount of historical data in the TiDB cluster, leading to slow data reading. You can check the GC configuration by executing
select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb;
and appropriately reduce tikv_gc_life_time
. Refer to TiDB Garbage Collection (GC).
- The CPU of the TiKV node does not support PCLMULQDQ or SSE 4.1 instructions. You can check this by running
cat /proc/cpuinfo
.
You can also skip the checksum calculation during BR backup by adding --checksum=false
.