After expanding a TiKV node, executing DDL is very slow

translator_bot · June 21, 2024, 7:25am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv扩容一个节点后执行ddl很慢

| username: TiDBer_yBunUeUc

Problem description: After expanding the cluster with an additional KV node, truncating an empty table executes very slowly. After removing that node, executing DDL operations with the original three-node cluster is very fast. Seeking clarification.

translator_bot · June 21, 2024, 7:25am

| username: tidb狂热爱好者 | Original post link

Take a look at the network. I see that you are spanning across two sub-IPs. TiKV works best on physical machines. If it is a physical machine, 99.1 and 88.1 mean that it is spanning across switches. It is best to be under the same switch.

translator_bot · June 21, 2024, 7:25am

| username: ShawnYan | Original post link

Also, after this KV node was added, were all the regions rebalanced?

translator_bot · June 21, 2024, 7:25am

| username: TiDBer_yBunUeUc | Original post link

The region has enabled the automatic balancing parameter.

translator_bot · June 21, 2024, 7:25am

| username: 哈喽沃德 | Original post link

Is the performance of the new node poor?

translator_bot · June 21, 2024, 7:25am

| username: TiDBer_yBunUeUc | Original post link

The performance of the newly added node is the same as the previous three nodes.

translator_bot · June 21, 2024, 7:25am

| username: 路在何chu | Original post link

Was the execution done after the region balance was completed, or was it executed without the region being balanced?

translator_bot · June 21, 2024, 7:25am

| username: 路在何chu | Original post link

It usually takes at least a day for the region to balance after expansion.

translator_bot · June 21, 2024, 7:25am

| username: TiDBer_yBunUeUc | Original post link

Running now

translator_bot · June 21, 2024, 7:25am

| username: 胡杨树旁 | Original post link

Check the TiDB logs corresponding to the DDL owner node to see if there are any hints.

translator_bot · June 21, 2024, 7:25am

| username: 像风一样的男子 | Original post link

Is your cluster lagging? Check the latency on the dashboard monitoring.

translator_bot · June 21, 2024, 7:25am

| username: TiDBer_yBunUeUc | Original post link

Which specific metrics should I look at? Please clarify, thanks.

translator_bot · June 21, 2024, 7:25am

| username: 像风一样的男子 | Original post link

Check the CPU and memory in the dashboard, and the latency. Then look at the KV metrics in Grafana to see if the I/O latency is normally in the tens of milliseconds.