Version 5.0.4 Using Placement Rules to Implement Same Data Center 5(3+2) Architecture with Continuous Growth of Missed Peers

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 5.0.4 版本使用placement rule 实现同机房5(3+2)架构 miss peer 持续增长

| username: TiDBer_yyy

[TiDB Usage Environment] Test
[TiDB Version] 5.0.4
Cluster Architecture

Original Architecture: The TiKV machine at 10.xxx.xx.xx was not expanded, only the 172 machine.

Operations:

  1. Modify BJ1 data center TiKV server.label tiup cluster edit-config xxx && tiup cluster reload xxx -R tikv -y

  2. BJ4 does not elect a leader
    tiup ctl:v5.0.5 pd --pd=http://127.0.0.1:2379 config set label-property reject-leader dc bj4

  3. Expand 3 TiKV machines at 10.x.x.x:20160
    tiup cluster scale-out tidb_placement_rule_remove scale-out-bj4.yaml -u root -p

  4. Apply placement rule to have 2 regions in BJ4 data center to achieve 5 replicas (3=2)

tiup ctl:v5.0.4 pd --pd=http://127.0.0.1:2379 config placement-rules rule-bundle save --in=rules.json
tiup ctl:v5.0.4 pd --pd=http://127.0.0.1:2379 config placement-rules show

Rules:

[
  {
    "group_id": "pd",
    "id": "dc-bj1",
    "start_key": "",
    "end_key": "",
    "role": "voter",
    "count": 3,
    "label_constraints": [
      {
        "key": "dc",
        "op": "in",
        "values": [
          "bj1"
        ]
      }
    ],
    "location_labels": [
      "dc"
    ]
  },
  {
    "group_id": "pd",
    "id": "dc-bj4",
    "start_key": "",
    "end_key": "",
    "role": "follower",
    "count": 2,
    "label_constraints": [
      {
        "key": "dc",
        "op": "in",
        "values": [
          "bj4"
        ]
      }
    ],
    "location_labels": [
      "dc"
    ]
  }
]

Phenomenon:

  1. After 2 hours, only 1 replica was added in the BJ4 data center, and all replicas were on one TiKV machine. At this time, miss-peer-region-count kept increasing.

  2. After the 4-replica region scheduling was completed, waited for half an hour and found that there were still no 5 replicas, the number of 5-replica regions = 0

tiup ctl:v5.0.4 pd --pd=http://127.0.0.1:2379 region --jq=".regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length == 5)}" |wc -l 
  1. Check the logs of the newly expanded TiKV:

  2. After restarting TiKV, the 5th region replica started; miss-peer began to decrease

tiup cluster restart xxx -N tikv-02,tikv-03

image

Questions:

  1. Why was the 5th replica region not generated before restarting TiKV?

  2. Why did the other two TiKV not receive region scheduling information before the restart?

| username: Jellybean | Original post link

It is recommended to start using the Placement Rule feature from version 6.1 onwards, as many issues have been fixed. Moreover, from version 6.1 onwards, it is an officially released feature, and you can use SQL for scheduling, which will be much more convenient.

You can refer to my previous experience: 专栏 - TiDB 冷热存储分离解决方案 | TiDB 社区

| username: TiDBer_yyy | Original post link

The potential is quite large. Disaster recovery is also very useful.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.