Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 咨询贴:ping latency 和手动ping的区别
[TiDB Environment] Production Environment
[TiDB Version] 5.0.4
Problem Description:
It has been observed for a long time that the ping latency of a certain machine is significantly higher than other nodes. I want to understand the mechanism of ping latency.
Current Findings:
Suspected issue with TiKV-A machine, TiKV node issue:
- The network latency from all PD nodes to TiKV-A is 4.5s.
- The latency from all other TiKV nodes to TiKV-A, except for TiKV-A itself, is 4.5s.
- The network latency between other TiKV nodes and from PD to TiKV is at the microsecond level.
- The NTP service of the node is normal, and the NTP delay is within the normal range (much less than 4.5s).
Are the networks of several connected nodes the same?
Did the TiKV A server experience network latency?
There might be a network issue with the tikv-a machine. Can you try pinging it from any other machine to check?
The NTP service appears to be normal, and the latency is also normal.
The network latency from tikv-a to others and from other machines to tidb-a is at the millisecond level.
NTP is used for time synchronization and should have little to do with ping. Tikva, can you execute tc qdisc show dev eth0
, replacing eth0 with the actual network interface in use, and take a look?
Is it across regions or subnets?
Machines in the same region
Is there packet loss or bandwidth limitation?
How do you specifically check ping latency?
I haven’t tested it yet, but you can give it a try. I bought the cloud vendor’s ECS service, and the bandwidth limit should be relatively wide.