Encountering TiDB 5.4, Logs Continuously Report Errors

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 遇到TiDB 5.4,日志一直报错

| username: japson

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.2
[Encountered Issues]
Issue 1: The log keeps reporting errors, how to solve it:
[2023/01/06 15:46:21.745 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=“[server:8052]invalid sequence 68 != 1”] [stack=“github.com/pingcap/tidb/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:516”]

Issue 2: This cluster suddenly had its memory filled up in an instant today, and the TiDB server IO was also maxed out, causing it to be unable to respond to business operations.

| username: 我是咖啡哥 | Original post link

Question two should look at the slow SQL at that time.

| username: TiDBer_jYQINSnf | Original post link

The disk is full, and the memory is also full. There are obvious table scan actions. Check if there are any new SQL queries deployed.

| username: japson | Original post link

I didn’t see any particularly slow SQL on the dashboard, but I noticed a rather unusual slow SQL in the logs:
sql="SELECT * from parts_status\r\n where ORDER_NO in (…). This table is very small, only a few hundred thousand rows, but the content inside this IN clause is quite large. This might be causing the issue. I’ll need to keep observing to see if this problem persists.

| username: ffeenn | Original post link

You can refer to this article to troubleshoot the issue: 大量[ERROR] [terror.go:307] [“encountered error”] [error=EOF] appearing in tidb log after upgrading to 5.3.0 - :ringer_planet: TiDB / Deployment & Operations Management - TiDB Q&A Community (asktug.com)

| username: 裤衩儿飞上天 | Original post link

There is no particularly slow one, but having many concurrent queries on hundreds of thousands of tables is also quite scary. Check how the CPU usage is.

| username: japson | Original post link

CPU usage is very low, and this cluster is an internal system with low concurrency. Currently, the most suspicious part seems to be the SQL: “SELECT * from parts_status where ORDER_NO in (…)”. This table is very small, with only a few hundred thousand rows, but the content inside the IN clause is quite large. We have changed this and will continue to monitor.

| username: tidb菜鸟一只 | Original post link

Memory and IO being maxed out instantly is almost certainly an SQL issue.

| username: japson | Original post link

Indeed, but one strange thing is that TiKV doesn’t have high traffic, and the cluster isn’t under heavy load. If it were solely a slow SQL issue, I think TiKV’s traffic or performance would definitely be noticeably different.

| username: Lucien-卢西恩 | Original post link

This error occurs when TiDB disconnects from other components. It can be ignored for now as it does not affect the normal service and usage of the cluster.

A sudden memory spike is commonly seen during full table scans or HashJoin aggregation calculations when TiDB Server aggregates a large amount of data for computation. You can check for inaccurate statistics or full table scans in the slow query logs. Refer to the SQL query issues here: 慢查询日志 | PingCAP 文档中心

| username: japson | Original post link

Okay, thank you.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.