TiKV Printing a Large Number of ["kv rpc failed"]

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv 打印大量 [“kv rpc failed”]

| username: TiDBer_7iWMwTcC

【TiDB Usage Environment】Production Environment
【TiDB Version】v7.5.0
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Problem Phenomenon and Impact】Discovered a large number of [“kv rpc failed”] [err=RemoteStopped] logs in tikv. Why are these logs being printed, and do they affect stability and performance?
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】

| username: xfworld | Original post link

Check if several nodes in the cluster are functioning properly.

| username: TiDBer_7iWMwTcC | Original post link

Status of each node

| username: tidb菜鸟一只 | Original post link

Info level logs should not have any impact, right?

| username: 小龙虾爱大龙虾 | Original post link

INFO level is fine. Also, check the monitoring. If there are no issues with TiDB’s KV requests, I think there are no problems. The logs are quite detailed.

| username: 普罗米修斯 | Original post link

The log level is too detailed. If it’s set to info, it doesn’t affect usage and can be ignored. I have adjusted it to the error level.

| username: Lystorm | Original post link

Are there any large-scale write operations being performed?

| username: Soysauce520 | Original post link

Could you check if there are any fluctuations in the network?

| username: zhanggame1 | Original post link

You don’t need to worry about info level. If you’re concerned, check if the CPU load is too high or if the network is stable.

| username: stephanie | Original post link

I found that there are also many similar info-level errors in my TiKV logs, and the number is quite large. However, there is no abnormal information in the PD and TiDB logs. I wonder if this is the same issue as the one below:

| username: TiDBer_HErMeXDz | Original post link

Is there a network issue with RPC?

| username: ffeenn | Original post link

It can be ignored if it doesn’t affect anything.

| username: Jellybean | Original post link

This is an RPC request error when accessing KV data, at the info level, indicating that there is an internal retry mechanism to handle such situations, generally without impact.

To be on the safe side, you can conduct further inspection and analysis of the cluster’s health status.

| username: 呢莫不爱吃鱼 | Original post link

The info level is fine, no need to worry too much.

| username: stephanie | Original post link

Thank you for the suggestion. I’ll check it out when I get to work. I noticed that there is quite a bit of relevant information in the logs.

| username: zhaokede | Original post link

Check the network situation, there should be some fluctuations.

| username: TIDB-Learner | Original post link

Ignore it. The impact is not significant. You are doing a mixed deployment, right? Generally, it is not recommended to mix deploy PD and KV.

| username: YuchongXU | Original post link

Can be ignored.

| username: paulli | Original post link

Take a look at the backoff.

| username: stephanie | Original post link

I found a lot of error messages in the logs of one of the TiKV machines. The Batch receive average duration in TiDB on Grafana also looks abnormal. I’m not sure if they are related. Does anyone know what’s going on?