[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.2
[Encountered Issues]
Issue 1: The log keeps reporting errors, how to solve it:
[2023/01/06 15:46:21.745 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=“[server:8052]invalid sequence 68 != 1”] [stack=“github.com/pingcap/tidb/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:516”]
Issue 2: This cluster suddenly had its memory filled up in an instant today, and the TiDB server IO was also maxed out, causing it to be unable to respond to business operations.
I didn’t see any particularly slow SQL on the dashboard, but I noticed a rather unusual slow SQL in the logs:
sql="SELECT * from parts_status\r\n where ORDER_NO in (…). This table is very small, only a few hundred thousand rows, but the content inside this IN clause is quite large. This might be causing the issue. I’ll need to keep observing to see if this problem persists.
There is no particularly slow one, but having many concurrent queries on hundreds of thousands of tables is also quite scary. Check how the CPU usage is.
CPU usage is very low, and this cluster is an internal system with low concurrency. Currently, the most suspicious part seems to be the SQL: “SELECT * from parts_status where ORDER_NO in (…)”. This table is very small, with only a few hundred thousand rows, but the content inside the IN clause is quite large. We have changed this and will continue to monitor.
Indeed, but one strange thing is that TiKV doesn’t have high traffic, and the cluster isn’t under heavy load. If it were solely a slow SQL issue, I think TiKV’s traffic or performance would definitely be noticeably different.
This error occurs when TiDB disconnects from other components. It can be ignored for now as it does not affect the normal service and usage of the cluster.
A sudden memory spike is commonly seen during full table scans or HashJoin aggregation calculations when TiDB Server aggregates a large amount of data for computation. You can check for inaccurate statistics or full table scans in the slow query logs. Refer to the SQL query issues here: 慢查询日志 | PingCAP 文档中心