After TiKV Node Offline, Region Distribution Becomes Uneven

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv节点下线后,region 分布不均匀

| username: Hacker_小峰

[TiDB Usage Environment] Production Environment / Test / Poc
[TiDB Version] v5.4.3
[Reproduction Path] Two TiKV nodes are deployed on each physical machine (deploying multiple TiKV instances on different disks on the same physical machine), with a total of 6 TiKV nodes deployed on 3 physical machines. When one TiKV on a physical machine is shut down, the regions migrate to another TiKV on the same physical machine, causing uneven region distribution.

[Resource Configuration] 6 TiKV, 3 PD, 3 TiDB (2 TiKV nodes on each physical machine, 3 TiKV physical machines with identical hardware)
[Encountered Problem: Phenomenon and Impact]
Relevant configuration is as follows:

server_configs:
  tidb:
    log.slow-threshold: 300
  tikv:
    raftdb.defaultcf.force-consistency-checks: false
    raftstore.apply-max-batch-size: 256
    raftstore.apply-pool-size: 8
    raftstore.hibernate-regions: true
    raftstore.raft-max-inflight-msgs: 2048
    raftstore.store-max-batch-size: 256
    raftstore.store-pool-size: 8
    raftstore.sync-log: false
    readpool.coprocessor.use-unified-pool: true
    readpool.storage.use-unified-pool: true
    readpool.unified.max-thread-count: 24
    rocksdb.defaultcf.force-consistency-checks: false
    rocksdb.lockcf.force-consistency-checks: false
    rocksdb.raftcf.force-consistency-checks: false
    rocksdb.writecf.force-consistency-checks: false
    server.grpc-concurrency: 8
    storage.block-cache.capacity: 32G
    storage.scheduler-worker-pool-size: 8
  pd:
    dashboard.public-path-prefix: /test-tidb
    replication.enable-placement-rules: true
    replication.location-labels:
    - zone
    - rack
    - host
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
» config show leader-scheduler-limit
{
  "replication": {
    "enable-placement-rules": "true",
    "enable-placement-rules-cache": "false",
    "isolation-level": "",
    "location-labels": "zone,rack,host",
    "max-replicas": 3,
    "strictly-match-label": "true"
  },
  "schedule": {
    "enable-cross-table-merge": "true",
    "enable-joint-consensus": "true",
    "high-space-ratio": 0.7,
    "hot-region-cache-hits-threshold": 3,
    "hot-region-schedule-limit": 8,
    "hot-regions-reserved-days": 7,
    "hot-regions-write-interval": "10m0s",
    "leader-schedule-limit": 8,
    "leader-schedule-policy": "count",
    "low-space-ratio": 0.8,
    "max-merge-region-keys": 200000,
    "max-merge-region-size": 20,
    "max-pending-peer-count": 64,
    "max-snapshot-count": 64,
    "max-store-down-time": "30m0s",
    "merge-schedule-limit": 8,
    "patrol-region-interval": "10ms",
    "region-schedule-limit": 4096,
    "region-score-formula-version": "v2",
    "replica-schedule-limit": 64,
    "split-merge-interval": "1h0m0s",
    "tolerant-size-ratio": 0
  }
}

[Attachments: Screenshots/Logs/Monitoring]



Why do the regions migrate to another TiKV node (store-396786) on the same physical machine when store-140936 is shut down, instead of balancing the migration across the remaining 5 TiKV nodes?

mysql> SELECT t1.store_id, sum(case when t1.is_leader = 1 then 1 else 0 end) leader_cnt,count(t1.peer_id) region_cnt FROM information_schema.tikv_region_peers t1 GROUP BY t1.store_id
    -> ;
+----------+------------+------------+
| store_id | leader_cnt | region_cnt |
+----------+------------+------------+
|        3 |       6840 |      16757 |
|   140936 |          0 |      12275 |
|        6 |       6846 |      17050 |
|        7 |       6838 |      17151 |
|   396786 |       6839 |      21927 |
|        5 |       6838 |      17444 |
+----------+------------+------------+
6 rows in set (1.33 sec)
| username: tidb狂热爱好者 | Original post link

You should have set a policy and applied a label.

| username: Hacker_小峰 | Original post link

Yes, the label tags have been added. Details are as follows:

tikv_servers:
- host: 192.1.1.3
  ssh_port: 22
  port: 20175
  status_port: 20185
  deploy_dir: /data01/tidb-deploy/tikv-20175
  data_dir: /data01/tidb-data/tikv-20175
  log_dir: /data/tidb-deploy/tikv-20175/log
  config:
    server.labels:
      host: tikv3
      rack: E06
      zone: qsh
  arch: amd64
  os: linux
- host: 192.1.1.3
  ssh_port: 22
  port: 20176
  status_port: 20186
  deploy_dir: /data02/tidb-deploy/tikv-20176
  data_dir: /data02/tidb-data/tikv-20176
  log_dir: /data/tidb-deploy/tikv-20176/log
  config:
    server.labels:
      host: tikv3
      rack: E06
      zone: qsh
- host: 192.1.1.4
  ssh_port: 22
  port: 20175
  status_port: 20185
  deploy_dir: /data01/tidb-deploy/tikv-20175
  data_dir: /data01/tidb-data/tikv-20175
  log_dir: /data/tidb-deploy/tikv-20175/log
  config:
    server.labels:
      host: tikv4
      rack: F06
      zone: qsh
  arch: amd64
  os: linux
- host: 192.1.1.4
  ssh_port: 22
  port: 20176
  status_port: 20186
  deploy_dir: /data02/tidb-deploy/tikv-20176
  data_dir: /data02/tidb-data/tikv-20176
  log_dir: /data/tidb-deploy/tikv-20176/log
  config:
    server.labels:
      host: tikv4
      rack: F06
      zone: qsh
- host: 192.1.1.5
  ssh_port: 22
  port: 20177
  status_port: 20187
  deploy_dir: /data02/tidb-deploy/tikv-20177
  data_dir: /data02/tidb-data/tikv-20177
  log_dir: /data02/tidb-deploy/tikv-20177/log
  config:
    server.labels:
      host: tikv5
      rack: E06
      zone: qsh
  arch: amd64
  os: linux
- host: 192.1.1.5
  ssh_port: 22
  port: 20179
  status_port: 20189
  deploy_dir: /data03/tidb-deploy/tikv-20179
  data_dir: /data03/tidb-data/tikv-20179
  log_dir: /data03/tidb-deploy/tikv-20179/log
  config:
    server.labels:
      host: tikv5
      rack: E06
      zone: qsh  

The operation performed is tiup cluster stop tidb-test --node 192.1.1.5:20177

| username: Billmay表妹 | Original post link

You can check out @h5n1’s content in this article~

| username: Hacker_小峰 | Original post link

Thank you, cousin.

| username: 考试没答案 | Original post link

Last time, there was a magical operation of restarting the PD leader that solved the issue. If the SOP content above doesn’t satisfy you, you can also try restarting it.

| username: 考试没答案 | Original post link

Let me ask a question: Why is the version 5.4.3 and not a newer version???

| username: 裤衩儿飞上天 | Original post link

Labeled it.

| username: Hacker_小峰 | Original post link

The current version is sufficient, so I haven’t updated to the latest version yet.

| username: Hacker_小峰 | Original post link

Thanks! I’ll give it a try~

| username: TiDBer_jYQINSnf | Original post link

Although you marked three types: host, rack, and zone, did you actually only use host?
That is,
tikv3\tikv3 as one group
tikv4\tikv4 as one group
tikv5\tikv5 as one group

If you take any tikv offline, to ensure that there is at most one replica in the same area, you can only move it to another node on the same physical machine.

| username: Hacker_小峰 | Original post link

Ah, a word to wake up the dreamer, this should be the reason. The only label that actually works is probably host…

| username: TiDBer_jYQINSnf | Original post link

To use racks, you need more than 3 hosts. To avoid losing 2 replicas when shutting down a rack, each rack will have one replica.

| username: Hacker_小峰 | Original post link

:+1:

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.