TiDB Cluster Management-303 Instance Manual: Detailed Explanation of TiKV Online Scaling Down Operation

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB集群管理-303 实例手册之TiKV在线缩容操作详解

| username: linuxmysql

Check the current cluster status:
tiup cluster display tidb-test
tiup is checking updates for component cluster …
A new version of cluster is available:
The latest version: v1.15.0
Local installed version: v1.9.3
Update current component: tiup update cluster
Update all components: tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v6.1.0
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://172.16.1.203:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


172.16.1.201:9093 alertmanager 172.16.1.201 9093/9094 linux/x86_64 Up /tidb-data/alertmanager-9093 /tidb-deploy/alertmanager-9093
172.16.1.201:3000 grafana 172.16.1.201 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
172.16.1.201:2379 pd 172.16.1.201 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.16.1.202:2379 pd 172.16.1.202 2379/2380 linux/x86_64 Up|L /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.16.1.203:2379 pd 172.16.1.203 2379/2380 linux/x86_64 Up|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.16.1.201:9090 prometheus 172.16.1.201 9090/12020 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
172.16.1.201:4000 tidb 172.16.1.201 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
172.16.1.201:20160 tikv 172.16.1.201 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.16.1.202:20160 tikv 172.16.1.202 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.16.1.203:20160 tikv 172.16.1.203 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.16.1.204:20160 tikv 172.16.1.204 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
Total nodes: 11

Scale-in the node 172.16.1.204:
tiup cluster scale-in tidb-test --node 172.16.1.204:20160

[tidb@node201 ~]$ tiup cluster scale-in tidb-test --node 172.16.1.204:20160
tiup is checking updates for component cluster …
A new version of cluster is available:
The latest version: v1.15.0
Local installed version: v1.9.3
Update current component: tiup update cluster
Update all components: tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster scale-in tidb-test --node 172.16.1.204:20160
This operation will delete the 172.16.1.204:20160 nodes in tidb-test and all their data.
Do you want to continue? [y/N]:(default=N) y
Scale-in nodes…

  • [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.202
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.203
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.202
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.204
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.203
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [ Serial ] - ClusterOperate: operation=ScaleInOperation, options={Roles: Nodes:[172.16.1.204:20160] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: Concurrency:5 SSHProxyHost: SSHProxyPort:22 SSHProxyUser:tidb SSHProxyIdentity:/home/tidb/.ssh/id_rsa SSHProxyUsePassword:false SSHProxyTimeout:5 CleanupData:false CleanupLog:false CleanupAuditLog:false RetainDataRoles: RetainDataNodes: ShowUptime:false DisplayMode:default Operation:StartOperation}
    The component tikv will become tombstone, maybe exists in several minutes or hours, after that you can use the prune command to clean it
  • [ Serial ] - UpdateMeta: cluster=tidb-test, deleted=''
  • [ Serial ] - UpdateTopology: cluster=tidb-test
  • Refresh instance configs
    • Generate config pd → 172.16.1.201:2379 … Done
    • Generate config pd → 172.16.1.202:2379 … Done
    • Generate config pd → 172.16.1.203:2379 … Done
    • Generate config tikv → 172.16.1.201:20160 … Done
    • Generate config tikv → 172.16.1.202:20160 … Done
    • Generate config tikv → 172.16.1.203:20160 … Done
    • Generate config tidb → 172.16.1.201:4000 … Done
    • Generate config prometheus → 172.16.1.201:9090 … Done
    • Generate config grafana → 172.16.1.201:3000 … Done
    • Generate config alertmanager → 172.16.1.201:9093 … Done
  • Reload prometheus and grafana
    • Reload prometheus → 172.16.1.201:9090 … Done
    • Reload grafana → 172.16.1.201:3000 … Done
      Scaled cluster tidb-test in successfully

Check the cluster status:
tiup cluster display tidb-test

[tidb@node201 ~]$ tiup cluster display tidb-test
tiup is checking updates for component cluster …
A new version of cluster is available:
The latest version: v1.15.0
Local installed version: v1.9.3
Update current component: tiup update cluster
Update all components: tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v6.1.0
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://172.16.1.203:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


172.16.1.201:9093 alertmanager 172.16.1.201 9093/9094 linux/x86_64 Up /tidb-data/alertmanager-9093 /tidb-deploy/alertmanager-9093
172.16.1.201:3000 grafana 172.16.1.201 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
172.16.1.201:2379 pd 172.16.1.201 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.16.1.202:2379 pd 172.16.1.202 2379/2380 linux/x86_64 Up|L /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.16.1.203:2379 pd 172.16.1.203 2379/2380 linux/x86_64 Up|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.16.1.201:9090 prometheus 172.16.1.201 9090/12020 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
172.16.1.201:4000 tidb 172.16.1.201 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
172.16.1.201:20160 tikv 172.16.1.201 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.16.1.202:20160 tikv 172.16.1.202 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.16.1.203:20160 tikv 172.16.1.203 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.16.1.204:20160 tikv 172.16.1.204 20160/20180 linux/x86_64 Tombstone /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
Found that the status of the scaled-in node has changed to Tombstone, indicating that the node has been taken offline:

Execute tiup cluster prune tidb-test to clean up node information; note: executing this operation will restart the instance
tiup cluster prune tidb-test
[tidb@node201 ~]$ tiup cluster prune tidb-test
tiup is checking updates for component cluster …
A new version of cluster is available:
The latest version: v1.15.0
Local installed version: v1.9.3
Update current component: tiup update cluster
Update all components: tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster /home/tidb/.tiup/components/cluster/v1.9.3/tiup-cluster prune tidb-test

  • [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.202
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.203
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.204
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.203
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.201
  • [Parallel] - UserSSH: user=tidb, host=172.16.1.202
  • [ Serial ] - FindTomestoneNodes
    Will destroy these nodes: [172.16.1.204:20160]
    Do you confirm this action? [y/N]:(default=N) y
    Start destroy Tombstone nodes: [172.16.1.204:20160] …
  • [ Serial ] - ClusterOperate: operation=DestroyTombstoneOperation, options={Roles: Nodes: Force:true SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:true NativeSSH:false SSHType: Concurrency:5 SSHProxyHost: SSHProxyPort:22 SSHProxyUser:tidb SSHProxyIdentity:/home/tidb/.ssh/id_rsa SSHProxyUsePassword:false SSHProxyTimeout:5 CleanupData:false CleanupLog:false CleanupAuditLog:false RetainDataRoles: RetainDataNodes: ShowUptime:false DisplayMode:default Operation:StartOperation}
    Stopping component tikv
    Stopping instance 172.16.1.204
    Stop tikv 172.16.1.204:20160 success
    Destroying component tikv
    Destroying instance 172.16.1.204
    Destroy 172.16.1.204 success
  • Destroy tikv paths: [/tidb-data/tikv-20160 /tidb-deploy/tikv-20160/log /tidb-deploy/tikv-20160 /etc/systemd/system/tikv-20160.service]
    Stopping component node_exporter
    Stopping instance 172.16.1.204
    Stop 172.16.1.204 success
    Stopping component blackbox_exporter
    Stopping instance 172.16.1.204
    Stop 172.16.1.204 success
    Destroying monitored 172.16.1.204
    Destroying instance 172.16.1.204
    Destroy monitored on 172.16.1.204 success
    Delete public key 172.16.1.204
    Delete public key 172.16.1.204 success
  • [ Serial ] - UpdateMeta: cluster=tidb-test, deleted='172.16.1.204:20160'
  • [ Serial ] - UpdateTopology: cluster=tidb-test
  • Refresh instance configs
    • Generate config pd → 172.16.1.201:2379 … Done
    • Generate config pd → 172.16.1.202:2379 … Done
    • Generate config pd → 172.16.1.203:2379 … Done
    • Generate config tikv → 172.16.1.201:20160 … Done
    • Generate config tikv → 172.16.1.202:20160 … Done
    • Generate config tikv → 172.16.1.203:20160 … Done
    • Generate config tidb → 172.16.1.201:4000 … Done
    • Generate config prometheus → 172.16.1.201:9090 … Done
    • Generate config grafana → 172.16.1.201:3000 … Done
    • Generate config alertmanager → 172.16.1.201:9093 … Done
  • Reload prometheus and grafana
    • Reload prometheus → 172.16.1.201:9090 … Done
    • Reload grafana → 172.16.1.201:3000 … Done
      Destroy success

Check the cluster status, 172.16.1.204 has been removed from the cluster
[tidb@node201 ~]$ tiup cluster display tidb-test
tiup is checking updates for component cluster …
A new version of cluster is available:
The latest version