BR Backup is Very Slow

translator_bot · June 20, 2024, 1:38pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: BR 备份非常缓慢

| username: TiDBer_RywnG56h

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
We have 2 production environments, each with the same TiDB topology and resource specifications (both deployed in K8S through the operator); the data volume, query QPS, etc. are also similar. However, BR backup in one environment takes only 37 minutes, while in the other environment it takes 6 hours.

Do you have any troubleshooting ideas?

translator_bot · June 20, 2024, 1:38pm

| username: zhaokede | Original post link

Is the amount of backed-up data and the network situation the same?

translator_bot · June 20, 2024, 1:38pm

| username: TiDBer_RywnG56h | Original post link

The data volume is of the same order. The backup of the slow cluster (6 hours) to object storage is 35GB, while the backup of the fast cluster (37 minutes) to object storage is 24.5GB.

The network conditions are the same. The fast cluster is on Alibaba Cloud, backing up to OSS; the slow cluster is on AWS, backing up to S3. I don’t think there is a network bottleneck.

Moreover, the slow cluster has only started to become slow in the past half month.

translator_bot · June 20, 2024, 1:38pm

| username: TiDBer_RywnG56h | Original post link

In our AWS cluster, TiKV is set to 2 instances, but due to some anomalies, there are actually 4 instances of TiKV, one of which is down. This has caused some warn-level alerts during backup with BR, and I’m not sure if this is the reason.

Currently, the distribution of TiKV stores is as follows:

tikv:
    bootStrapped: true
    failoverUID: 186982df-36ba-4391-b21b-40bba57a2222
    failureStores:
      "104":
        createdAt: "2023-11-29T14:39:14Z"
        podName: basicai-tikv-2
        storeID: "104"
    image: pingcap/tikv:v6.1.0
    phase: Scale
    statefulSet:
      collisionCount: 0
      currentReplicas: 3
      currentRevision: basicai-tikv-654cf466dd
      observedGeneration: 8
      readyReplicas: 3
      replicas: 3
      updateRevision: basicai-tikv-654cf466dd
      updatedReplicas: 3
    stores:
      "1":
        id: "1"
        ip: basicai-tikv-0.basicai-tikv-peer.tidb-cluster.svc
        lastTransitionTime: "2024-06-05T13:56:51Z"
        leaderCount: 1030
        podName: basicai-tikv-0
        state: Up
      "6":
        id: "6"
        ip: basicai-tikv-1.basicai-tikv-peer.tidb-cluster.svc
        lastTransitionTime: "2024-06-05T13:55:12Z"
        leaderCount: 1030
        podName: basicai-tikv-1
        state: Up
      "104":
        id: "104"
        ip: basicai-tikv-2.basicai-tikv-peer.tidb-cluster.svc
        lastTransitionTime: "2024-06-07T15:38:32Z"
        leaderCount: 1037
        podName: basicai-tikv-2
        state: Up
      "30001":
        id: "30001"
        ip: basicai-tikv-3.basicai-tikv-peer.tidb-cluster.svc
        lastTransitionTime: "2023-12-04T13:44:47Z"
        leaderCount: 0  <----------- basicai-tikv-3 this POD does not exist, but there is a store with ID 3001
        podName: basicai-tikv-3
        state: Down