TiKV log indicates PD worker send latency inspector failed

translator_bot · June 23, 2024, 7:57am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv日志提示pd worker send latency inspecter failed

| username: Hacker007

Today, I found that one of the three TiKV nodes was down. The logs kept showing “pd worker send latency inspector failed.” I then tried to scale out a new node, but the same exception occurred. The other two nodes are fine.

translator_bot · June 23, 2024, 7:57am

| username: songxuecheng | Original post link

Check if there are any issues with PD IO.
Send the complete TiKV logs.
Send the TiKV monitoring data.

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

The logs of this node are constantly showing this line. I’m monitoring it, but I’m not sure which metric to look at.

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

Now it prompts this again.

translator_bot · June 23, 2024, 7:57am

| username: songxuecheng | Original post link

Please send the complete monitoring.

translator_bot · June 23, 2024, 7:57am

| username: h5n1 | Original post link

Check the network status between PD and TiKV.

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

This is related to GC. I saw an exception in the GC logs!

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

I still need to study the monitoring metrics carefully…

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

How to fix the disconnection between PD and TiKV?

translator_bot · June 23, 2024, 7:57am

| username: h5n1 | Original post link

Disconnected? Network down? If the network is up, it will reconnect automatically.

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

Executing telnet works, ruling out network issues. There are two sets of clusters, old and new. The old one has no issues, but the new one, which expanded with two new TiKV nodes, is experiencing connectivity problems. It first shows as Disconnected and then down. Not sure if it’s being affected by the old cluster.

translator_bot · June 23, 2024, 7:57am

| username: h5n1 | Original post link

Are there two ports for telnetting into TiKV?

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

Yes, when PD is in a telnet down state, the two ports of TiKV and the telnet PD in TiKV are both fine.

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

The TiKV logs also continuously indicate: pd worker send latency inspector failed

translator_bot · June 23, 2024, 7:57am

| username: Min_Chen | Original post link

Hello, could you please check if there are any anomalies in the PD leader’s logs?

translator_bot · June 23, 2024, 7:57am

| username: Hacker007 | Original post link

Thank you for your attention. Restarting TiKV solved the issue, but later there was data inconsistency, which was resolved by restarting TiDB.

translator_bot · June 23, 2024, 7:57am

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.