After Shrinking the Cluster, One Server Lacks Capacity, and the Two Nodes on It Frequently Show Disconnect

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 缩容集群后,一台服务器容量不足,上面的2个节点经常显示disconnect

| username: lizhuolin2020

[TiDB Usage Environment] Stress Testing
[TiDB Version] v4.0.6
[Reproduction Path] Initially, there were 6 TiKV nodes. Now, 3 new TiKV nodes on the same physical machine have been added for expansion, and the original 6 TiKV nodes are being scaled down. During the scaling down process, for performance considerations, both region-schedule-limit and leader-schedule-limit were set to 0 to prevent regions and leaders from participating in score balancing.
[Encountered Issue: Problem Phenomenon and Impact]
The disk usage on the physical machine where the new nodes are located is higher than expected. When the disk usage reaches 95%, two of the nodes frequently disconnect. Now, 2 new nodes have been added on other physical machines, and both region-schedule-limit and leader-schedule-limit are set to 1500, but the phenomenon has not improved.
[Resource Configuration]

There are a total of 6 machines, all of which are 40C128G5T physical machines.
10.58.100.152-20232.last.log.zip (1.3 MB)
10.58.100.152-20231.last.log.zip (1.5 MB)

| username: xfworld | Original post link

It is recommended to use 6.1.X or 6.5.X for stress testing…

Stress testing should also refer to the reference configuration provided in the official documentation. Mixed deployment has its own requirements and limitations.

| username: tidb菜鸟一只 | Original post link

Are the three newly added TiKV nodes on the same physical machine isolated in terms of resources?

| username: lizhuolin2020 | Original post link

Fix different CPUs, the storage directory is the same.

| username: BraveChen | Original post link

Mixed deployment requires resource isolation.

| username: xingzhenxiang | Original post link

In practice, there should be a single data node per disk.

image

tikv_servers:

  • host: 10.10.109.103
    port: 20160
    status_port: 20180
    deploy_dir: “/export/tikv1/tidb-deploy/tikv-20160”
    data_dir: “/export/tikv1/tidb-data/tikv-20160”
    log_dir: “/export/tikv1/tidb-deploy/tikv-20160/log”
    numa_node: “0”
    config:
    server.labels: { host: “tikv8” }
  • host: 10.10.109.103
    port: 20161
    status_port: 20181
    deploy_dir: “/export/tikv2/tidb-deploy/tikv-20161”
    data_dir: “/export/tikv2/tidb-data/tikv-20161”
    log_dir: “/export/tikv2/tidb-deploy/tikv-20161/log”
    numa_node: “1”
    config:
    server.labels: { host: “tikv8” }
  • host: 10.10.109.103
    port: 20162
    status_port: 20182
    deploy_dir: “/export/tikv3/tidb-deploy/tikv-20162”
    data_dir: “/export/tikv3/tidb-data/tikv-20162”
    log_dir: “/export/tikv3/tidb-deploy/tikv-20162/log”
    numa_node: “0”
    config:
    server.labels: { host: “tikv8” }
  • host: 10.10.109.103
    port: 20163
    status_port: 20183
    deploy_dir: “/export/tikv4/tidb-deploy/tikv-20163”
    data_dir: “/export/tikv4/tidb-data/tikv-20163”
    log_dir: “/export/tikv4/tidb-deploy/tikv-20163/log”
    numa_node: “1”
    config:
    server.labels: { host: “tikv8” }