Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tikv扩容一个节点后执行ddl很慢
Problem description: After expanding the cluster with an additional KV node, truncating an empty table executes very slowly. After removing that node, executing DDL operations with the original three-node cluster is very fast. Seeking clarification.
Take a look at the network. I see that you are spanning across two sub-IPs. TiKV works best on physical machines. If it is a physical machine, 99.1 and 88.1 mean that it is spanning across switches. It is best to be under the same switch.
Also, after this KV node was added, were all the regions rebalanced?
The region has enabled the automatic balancing parameter.
Is the performance of the new node poor?
The performance of the newly added node is the same as the previous three nodes.
Was the execution done after the region balance was completed, or was it executed without the region being balanced?
It usually takes at least a day for the region to balance after expansion.
Check the TiDB logs corresponding to the DDL owner node to see if there are any hints.
Is your cluster lagging? Check the latency on the dashboard monitoring.
Which specific metrics should I look at? Please clarify, thanks.
Check the CPU and memory in the dashboard, and the latency. Then look at the KV metrics in Grafana to see if the I/O latency is normally in the tens of milliseconds.
I checked the TiDB logs, and they are all info prompts without any obvious information.
Check whether the regions in the KV nodes are balanced. Also, verify if the region scheduling in PD is normal.
You can check if there are any issues with NTP.
Also, what is the ping result from 63.120 to 52.67?
Check the relevant performance parameters of this TIKV node.
It should be a network issue.
Is the machine configuration consistent? Is the cluster still in the expansion period?
He has a problem with his network.