TiFlash data cannot synchronize properly

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiFlash数据无法正常同步

| username: TiDBer_WZnVC06e

【TiDB Usage Environment】Production Environment
【TiDB Version】V6.1.2 in k8s
【Encountered Problem: Phenomenon and Impact】

  • Switched the machine for TiFlash by first scaling down and then scaling up (store id changed from 1846400725 to 2751805262)

  • Data not synchronized after executing alter table XXX set tiflash replica 1:

  • PD successfully added the rule, but there is no running operator, and the log contains the following information:

    Screenshot 2023-04-18 2.26.01 PM

[2023/04/14 15:06:46.145 +08:00] [WARN] [cluster.go:1296] ["store has been Tombstone"] [store-id=1846400725] [store-address=pre-bigdata-tidb-tiflash-0.pre-bigdata-tidb-tiflash-peer.pre-bigdata-tidb.svc:3930] [state=Up] [physically-destroyed=false]

[2023/04/14 15:07:32.041 +08:00] [WARN] [cluster.go:1145] ["not found the key match with the store label"] [store="id:2751805262 address:\"pre-bigdata-tidb-tiflash-0.pre-bigdata-tidb-tiflash-peer.pre-bigdata-tidb.svc:3930\" labels:<key:\"engine\" value:\"tiflash\" > version:\"v6.1.2\" peer_address:\"pre-bigdata-tidb-tiflash-0.pre-bigdata-tidb-tiflash-peer.pre-bigdata-tidb.svc:20170\" status_address:\"pre-bigdata-tidb-tiflash-0.pre-bigdata-tidb-tiflash-peer.pre-bigdata-tidb.svc:20292\" git_hash:\"2fa392de68269ac35827e2fd40f4aaef316e3316\" start_timestamp:1681456052 deploy_path:\"/tiflash\" "] [label-key=engine]
  • Store information is as follows, no anomalies found in the TiFlash log:
     "store": {
        "id": 2751805262,
        "address": "pre-bigdata-tidb-tiflash-0.pre-bigdata-tidb-tiflash-peer.pre-bigdata-tidb.svc:3930",
        "labels": [
          {
            "key": "engine",
            "value": "tiflash"
          }
        ],
        "version": "v6.1.2",
        "peer_address": "pre-bigdata-tidb-tiflash-0.pre-bigdata-tidb-tiflash-peer.pre-bigdata-tidb.svc:20170",
        "status_address": "pre-bigdata-tidb-tiflash-0.pre-bigdata-tidb-tiflash-peer.pre-bigdata-tidb.svc:20292",
        "git_hash": "2fa392de68269ac35827e2fd40f4aaef316e3316",
        "start_timestamp": 1681460309,
        "deploy_path": "/tiflash",
        "last_heartbeat": 1681789934080047340,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.718TiB",
        "available": "1.608TiB",
        "used_size": "21.96MiB",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 4,
        "region_weight": 1,
        "region_score": 4,
        "region_size": 4,
        "slow_score": 1,
        "start_ts": "2023-04-14T16:18:29+08:00",
        "last_heartbeat_ts": "2023-04-18T11:52:14.08004734+08:00",
        "uptime": "91h33m45.08004734s"
      }
    }
| username: 裤衩儿飞上天 | Original post link

  1. Did you change the IP?
  2. Did you execute alter table XXX set tiflash replica 1 before scaling down?
  3. What is the current status of store 1846400725?
| username: TiDBer_WZnVC06e | Original post link

  1. The cluster is deployed on k8s, and the pod names of the two stores are the same.
  2. Before scaling down, no table had a TiFlash replica (but the TiFlash region is not zero, suspecting that a bug was caused by directly deleting a database).
  3. Now this store cannot be seen in pd-ctl, and the log prints the following information: store has been Tombstone.