Ask if there are adjustable parameters for TiDB's BR restore

translator_bot · June 20, 2024, 3:35pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 问一下tidb的br还原是否有可调参数

| username: tidb狂热爱好者

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Can the restoration speed be increased?

translator_bot · June 20, 2024, 3:35pm

| username: tidb狂热爱好者 | Original post link

This is my backup speed at 33.3g, but the restore is exactly ten times slower. What could be the reason? Which parameter hasn’t been adjusted? Backup to S3 and restore from S3.

translator_bot · June 20, 2024, 3:35pm

| username: Kamner | Original post link

Where does it show that it’s slow? What was the original response time?

translator_bot · June 20, 2024, 3:35pm

| username: tidb狂热爱好者 | Original post link

Backup speed is 33GB, but restore speed is only 2.5GB.

translator_bot · June 20, 2024, 3:35pm

| username: Billmay表妹 | Original post link

Is there any background information? I can’t see anything from this.

translator_bot · June 20, 2024, 3:35pm

| username: Billmay表妹 | Original post link

Where to back up from? Where to restore to?

translator_bot · June 20, 2024, 3:35pm

| username: tidb狂热爱好者 | Original post link

I can only say that the restore speed is too slow. It might be due to the disk.

+00:00] [INFO] [collector.go:77] [“Full Restore success summary”] [total-ranges=246425] [ranges-succeed=246425] [ranges-failed=0] [split-region=8m29.304639926s] [restore-ranges=143985] [total-take=8h42m23.354499141s] [BackupTS=450047199759040513] [RestoreTS=450051441882038273] [total-kv=51476612568] [total-kv-size=6.522TB] [average-speed=208.1MB/s] [restore-data-size(after-compressed)=1.104TB] [Size=1104033608541]

Restoring 200MB per minute, a total of 6.5TB.

translator_bot · June 20, 2024, 3:35pm

| username: MrSylar | Original post link

Starting from v8.0.0, the br command-line tool introduces the --tikv-max-restore-concurrency parameter to control the maximum number of download and ingest files per TiKV node. Additionally, by adjusting this parameter, you can control the maximum length of the job queue (maximum length of the job queue = 32 * number of TiKV nodes * --tikv-max-restore-concurrency), thereby controlling the memory consumption of the BR node. Typically, --tikv-max-restore-concurrency is automatically adjusted based on the cluster configuration and does not require manual setting. If you observe through the TiKV-Details > Backup & Import > Import RPC count monitoring metric in Grafana that the number of download files remains close to 0 for a long time while the number of ingest files is always at the upper limit, it indicates that there is a backlog of ingest file tasks and the job queue has reached its maximum length. In this case, you can take the following measures to alleviate the task backlog issue:

Set the --ratelimit parameter to limit the download speed, ensuring that the ingest file tasks have sufficient resources. For example, if the disk throughput of any TiKV node is x MiB/s and the network bandwidth for downloading backup files is greater than x/2 MiB/s, you can set the parameter --ratelimit x/2. If the disk throughput of any TiKV node is x MiB/s and the network bandwidth for downloading backup files is less than or equal to x/2 MiB/s, you can leave the --ratelimit parameter unset.
Increase the --tikv-max-restore-concurrency to extend the maximum length of the job queue.

translator_bot · June 20, 2024, 3:35pm

| username: zhaokede | Original post link

Is the bottleneck in the hard drive or the network speed?

translator_bot · June 20, 2024, 3:35pm

| username: Miracle | Original post link

Please post the restore command.

translator_bot · June 20, 2024, 3:35pm

| username: mono | Original post link

Is the backup on the local machine, NFS, S3, or somewhere else? Check the network and disk performance metrics. If there are no bottlenecks, check the restore parameter settings. For example, if there are a large number of regions, see if there are settings like --granularity=“coarse-grained”.

translator_bot · June 20, 2024, 3:35pm

| username: TiDBer_QYr0vohO | Original post link

The key here still depends on network and disk performance.

translator_bot · June 20, 2024, 3:35pm

| username: WinterLiu | Original post link

Restoring from S3 is either due to slow network speed or slow disk write speed.