There are continuous errors in the TiDB server logs

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb server的日志里持续有error

| username: TiDBer_yUoxD0vR

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] v3.0.13
[Reproduction Path] What operations were performed that caused the issue
[Encountered Issue: Issue Phenomenon and Impact]
The following error is reported in the tidb server log every 30 seconds, even without any business access. Is there any way to locate the cause? Currently, it seems to have no impact on the business.
2023/07/31 14:57:18.837 terror.go:357: [error] EOF
github.com/pingcap/errors.AddStack
/home/jenkins/agent/workspace/tidb_v3.0.13/go/pkg/mod/github.com/pingcap/errors@v0.11.4/errors.go:174
github.com/pingcap/errors.Trace
/home/jenkins/agent/workspace/tidb_v3.0.13/go/pkg/mod/github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15
github.com/pingcap/tidb/server.(*packetIO).readOnePacket
/home/jenkins/agent/workspace/tidb_v3.0.13/go/src/github.com/pingcap/tidb/server/packetio.go:80
github.com/pingcap/tidb/server.(*packetIO).readPacket
/home/jenkins/agent/workspace/tidb_v3.0.13/go/src/github.com/pingcap/tidb/server/packetio.go:105
github.com/pingcap/tidb/server.(*clientConn).readPacket
/home/jenkins/agent/workspace/tidb_v3.0.13/go/src/github.com/pingcap/tidb/server/conn.go:265
github.com/pingcap/tidb/server.(*clientConn).readOptionalSSLRequestAndHandshakeResponse
/home/jenkins/agent/workspace/tidb_v3.0.13/go/src/github.com/pingcap/tidb/server/conn.go:471
github.com/pingcap/tidb/server.(*clientConn).handshake
/home/jenkins/agent/workspace/tidb_v3.0.13/go/src/github.com/pingcap/tidb/server/conn.go:172
github.com/pingcap/tidb/server.(*Server).onConn
/home/jenkins/agent/workspace/tidb_v3.0.13/go/src/github.com/pingcap/tidb/server/server.go:345
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357

[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: Billmay表妹 | Original post link

This error message is from the TiDB Server logs, where EOF indicates that the end-of-file marker was encountered while reading data. This error message is usually caused by the client or proxy disconnecting, and the TiDB Server only discovers the disconnection when it tries to return data to the client, thus printing this error message. The error itself does not impact business operations, but if it occurs frequently, it may affect the performance of the TiDB Server.

To pinpoint the cause of this issue, you can start from the following aspects:

  1. Check the TiDB Server configuration file to ensure that the parameters are set correctly. For example, verify if the max-connections parameter is set too low, causing insufficient connections.

  2. Check the TiDB Server log files for any other error or warning messages. If there are other error or warning messages, they might provide more clues.

  3. Check the TiDB Server monitoring information to see if there are any anomalies. You can use tools like TiDB Dashboard or Grafana to view the monitoring information of the TiDB Server, such as the number of connections, QPS, CPU usage, etc.

  4. Check the network environment of the TiDB Server to see if there are any network failures or congestion issues. You can use the ping command or other network diagnostic tools to check if the network connection is normal.

If the above methods do not resolve the issue, you can try upgrading the TiDB Server version or contact TiDB official technical support for assistance.

| username: Billmay表妹 | Original post link

The version is too low, it is recommended to upgrade to a newer version. When problems occur, community members do not have the appropriate environment to reproduce the issue and cannot help you troubleshoot it.

| username: tidb菜鸟一只 | Original post link

EOF errors seem to be network-level issues.

| username: TiDBer_yUoxD0vR | Original post link

Thank you for your response. Due to the specific situation of the company, we are currently unable to upgrade. The online environment did not report any errors before (the online environment was upgraded from a lower version). One day, the TiDB CPU was fully utilized, and after changing the feedback-probability to 0.00 and restarting the TiDB server, this error started to occur. In the test environment, no matter how the parameters are changed, the error persists.

Increasing max-connections also didn’t help. The show processlist command shows no connections, there is no business running, and the logs only contain this error without any other logs. The QPS is 0, and the CPU usage is almost 0. Since there is no business running, the Grafana report does not show any anomalies. I would like to ask if there are any debugging methods to locate what request is causing this error? The error occurs very regularly, every 30 seconds. I want to know what is happening every 30 seconds.

| username: redgame | Original post link

It won’t affect the business. Fix it when you can upgrade.

| username: dba-kit | Original post link

It looks like an error occurred during connection creation. Did you enable the SSL certificate? Or is it caused by load balancing health checks?

| username: dba-kit | Original post link

The general recommendation is to use the HTTP port 10080 for health checks, so that no errors will be logged.

| username: zhanggame1 | Original post link

Is there a load balancer in front of TiDB?

| username: TiDBer_iCdTOZ1r | Original post link

Learned.

| username: 像风一样的男子 | Original post link

Upgrading should resolve most of the inexplicable errors.

| username: TiDBer_yUoxD0vR | Original post link

The issue has been identified. It is the tidb_port_probe in Prometheus that probes every 30 seconds. Commenting out this line or changing it to probe 10080 will stop the error from occurring.
image
Can tidb3.0 change 4000 to probe 10080? I see that tidb6 probes 10080.
The strange thing is that all clusters are 3.0, all probing 4000, some report errors, and some do not. I don’t know what triggers it.

| username: Kongdom | Original post link

Could there be a firewall?

| username: TiDBer_yUoxD0vR | Original post link

Without a firewall, probing port 4000 is successful, but it just reports this error.

| username: dba-kit | Original post link

This error is actually just a handshake failure, so it’s not a big issue. The main problem is that your versions are too old, and the alert rules are not optimal yet.

| username: YuchongXU | Original post link

You can ignore it or upgrade.

| username: cassblanca | Original post link

This is the earliest version I’ve ever seen, you must be a die-hard fan.
You should send your cousin to provide on-site support for the upgrade. :joy:

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.