A large number of ERROR logs appear on the TiDB Server node: write: broken pipe

translator_bot · June 21, 2024, 5:02pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB Server节点出现大量的ERROR日志：write: broken pipe

| username: seiang

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.1
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]

On October 24th, a large number of connection error logs appeared in the tidb server logs, as shown below:

The business side also reported brief connection interruptions, and no adjustments were made at the business level.

translator_bot · June 21, 2024, 5:02pm

| username: Fly-bird | Original post link

Network issues or TiDB server anomalies, have you implemented load balancing for your TiDB?

translator_bot · June 21, 2024, 5:02pm

| username: seiang | Original post link

After troubleshooting the network, no obvious anomalies were found; the TiDB Server did not restart or have any other issues. TiDB uses Consul for load balancing, but during this time period, the ERROR logs mentioned above appeared in the logs of both TiDB servers.

translator_bot · June 21, 2024, 5:02pm

| username: TiDBer_小阿飞 | Original post link

Does the TiDB/KV Errors panel in Grafana include errors like write conflicts?

translator_bot · June 21, 2024, 5:02pm

| username: seiang | Original post link

This is the monitoring of the TiDB/KV Errors panel.

It doesn’t seem to have any obvious anomalies.

translator_bot · June 21, 2024, 5:02pm

| username: TiDBer_小阿飞 | Original post link

The red color indicating LOCK resolve on the OPS panel seems a bit abnormal. Is there a noticeable increase in txnLock on the KV Backoff OPS panel during the same time period? Could it be due to a brief write conflict caused by unlocking?

translator_bot · June 21, 2024, 5:02pm

| username: seiang | Original post link

It seems that txnLock did not significantly increase during the same period.

translator_bot · June 21, 2024, 5:02pm

| username: xfworld | Original post link

Did TiDB use Consul for load balancing?

So the client is still directly connected to the TiDB node in the end?

translator_bot · June 21, 2024, 5:02pm

| username: TiDBer_小阿飞 | Original post link

The time periods for the red query_resolve_lock_lite in LOCK resolve OPS and txnLock in KV Backoff OPS are consistent, 18:00-19:00, 20:00, 21:30.

translator_bot · June 21, 2024, 5:02pm

| username: 有猫万事足 | Original post link

The reason for this error is that one side is closing the connection while the other side is writing data.

If it’s not an issue with the connection pool strategy on the application side, then you should check the settings of proxies like HAProxy. Additionally, connection number limits on either side could also cause this problem.
You can refer to this: golang服务报错： write: broken pipe-CSDN博客.
TiDB itself will not proactively close this connection. Moreover, this error on the server side also indicates that the party closing the connection is most likely not TiDB.

translator_bot · June 21, 2024, 5:02pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.