After KV Node Removal, Commit Time Increases

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: kv节点剔除后,commit耗时增加

| username: TiDBer_8AZK5nuo

To improve efficiency, please provide the following information. Clear problem descriptions can lead to quicker resolutions:
[TiDB Usage Environment]
Production Business
[Overview] Scenario + Problem Overview
Due to hardware failure, a TiKV node was shut down. After shutting it down, the commit response time in the TiKV duration increased by nearly 10 times.
[Background] Operations performed
After the hardware failure of the TiKV node, the problematic TiKV was shut down and taken offline.
[Phenomenon] Business and database phenomena
The select and DML operations of the entire TiDB cluster have slowed down.
[Problem] Current issue encountered
The select and DML operations of the entire TiDB cluster have slowed down.
[Business Impact]
The select and DML operations of the entire TiDB cluster have slowed down.
[TiDB Version]
mysql> select tidb_version()
→ ;
±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version() |
±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v3.1.0
Git Commit Hash: 1347df814de0d603ef844d53b2c8d54fc760b75e
Git Branch: heads/refs/tags/v3.1.0
UTC Build Time: 2020-04-16 09:38:11
GoVersion: go version go1.13 linux/amd64
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |

[Application Software and Version]

[Attachments] Relevant logs and configuration information

  • TiUP Cluster Display information
  • TiUP Cluster Edit config information

Monitoring (https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana monitoring
  • TiDB Grafana monitoring
  • TiKV Grafana monitoring
  • PD Grafana monitoring
  • Corresponding module logs (including logs 1 hour before and after the issue)



For questions related to performance optimization and fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

| username: h5n1 | Original post link

Check if the leader/region balance is completed after shutting down TiKV, and the disk I/O situation.

| username: TiDBer_8AZK5nuo | Original post link

How long do you estimate it will take?

| username: h5n1 | Original post link

Is the disk I/O busy?

| username: TiDBer_8AZK5nuo | Original post link

Removed a node

| username: h5n1 | Original post link

Is it an NVMe drive? The read/write volume doesn’t seem high, but the utilization is maxed out.

| username: TiDBer_8AZK5nuo | Original post link

I have now turned off region balance. I checked the statistics and there are no background tasks. I don’t know what is causing the high IO.

| username: TiDBer_8AZK5nuo | Original post link

NVMe is a disk.

| username: 近墨者zyl | Original post link

It should be the re-election of regions and PD scheduling of regions.