Business peak slow query and occasional panic reported when upgrading TiDB from v4.0.16 to v6.5.5

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDBv4.0.16升级到v6.5.5出现业务高峰慢查tidb-偶报panic

| username: chnage

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.5
[Reproduction Path] No changes in business, maximum SQL execution timeout was 30 seconds for slow queries before the upgrade, no timeout; after the upgrade, slow queries experience timeout interruptions.
[Encountered Issues: Problem Phenomenon and Impact]
Related Alerts:

  1. TiDB_server_panic_total frequently (did not occur before)
  2. TiDB_tikvclient_backoff_seconds_count 1-2 times (occasionally monthly)
  3. TiDB_memory_abnormal once (did not occur before)
    [Actions Taken] Recollected statistics using analyze table
    [Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]
[2024/01/24 09:01:02.866 +08:00] [ERROR] [client_batch.go:303] [batchSendLoop] [r={}] [stack=“github.com/tikv/client-go/v2/internal/client.(*batchConn).batchSendLoop.func1
/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.4-0.20230912041415-9c163cc8574b/internal/client/client_batch.go:305
runtime.gopanic
/usr/local/go/src/runtime/panic.go:884
runtime.goPanicIndex
/usr/local/go/src/runtime/panic.go:113
github.com/pingcap/kvproto/pkg/tikvpb.encodeVarintTikvpb
/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230726063044-73d6d7f3756b/pkg/tikvpb/tikvpb.pb.go:5438
github.com/pingcap/kvproto/pkg/tikvpb.(*BatchCommandsRequest_Request_Coprocessor).MarshalToSizedBuffer
/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230726063044-73d6d7f3756b/pkg/tikvpb/tikvpb.pb.go:4325
github.com/pingcap/kvproto/pkg/tikvpb.(*BatchCommandsRequest_Request_Coprocessor).MarshalTo
/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230726063044-73d6d7f3756b/pkg/tikvpb/tikvpb.pb.go:4313
github.com/pingcap/kvproto/pkg/tikvpb.(*BatchCommandsRequest_Request).MarshalToSizedBuffer
/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230726063044-73d6d7f3756b/pkg/tikvpb/tikvpb.pb.go:3850
github.com/pingcap/kvproto/pkg/tikvpb.(*BatchCommandsRequest).MarshalToSizedBuffer
/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230726063044-73d6d7f3756b/pkg/tikvpb/tikvpb.pb.go:3808
github.com/pingcap/kvproto/pkg/tikvpb.(*BatchCommandsRequest).Marshal
/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230726063044-73d6d7f3756b/pkg/tikvpb/tikvpb.pb.go:3766
google.golang.org/protobuf/internal/impl.legacyMarshal
/go/pkg/mod/google.golang.org/protobuf@v1.28.1/internal/impl/legacy_message.go:402
google.golang.org/protobuf/proto.MarshalOptions.marshal
/go/pkg/mod/google.golang.org/protobuf@v1.28.1/proto/encode.go:166
google.golang.org/protobuf/proto.MarshalOptions.MarshalAppend
/go/pkg/mod/google.golang.org/protobuf@v1.28.1/proto/encode.go:125
github.com/golang/protobuf/proto.marshalAppend
/go/pkg/mod/github.com/golang/protobuf@v1.5.2/proto/wire.go:40
github.com/golang/protobuf/proto.Marshal
/go/pkg/mod/github.com/golang/protobuf@v1.5.2/proto/wire.go:23
google.golang.org/grpc/encoding/proto.codec.Marshal
/go/pkg/mod/google.golang.org/grpc@v1.51.0/encoding/proto/proto.go:45
google.golang.org/grpc.encode
/go/pkg/mod/google.golang.org/grpc@v1.51.0/rpc_util.go:595
google.golang.org/grpc.prepareMsg
/go/pkg/mod/google.golang.org/grpc@v1.51.0/stream.go:1708
google.golang.org/grpc.(*clientStream).SendMsg
/go/pkg/mod/google.golang.org/grpc@v1.51.0/stream.go:846
github.com/pingcap/kvproto/pkg/tikvpb.(*tikvBatchCommandsClient).Send
/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230726063044-73d6d7f3756b/pkg/tikvpb/tikvpb.pb.go:2068
github.com/tikv/client-go/v2/internal/client.(*batchCommandsClient).send
/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.4-0.20230912041415-9c163cc8574b/internal/client/client_batch.go:519
github.com/tikv/client-go/v2/internal/client.(*batchConn).getClientAndSend
/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.4-0.20230912041415-9c163cc8574b/internal/client/client_batch.go:381
github.com/tikv/client-go/v2/internal/client.(*batchConn).batchSendLoop
/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.4-0.20230912041415-9c163cc8574b/internal/client/client_batch.go:344”]

Slow Query
select fileid, userid, chkcode, ctime, sid, ascii_sid, permission, source, file_mtime, ext_perm from xxxx where xxxid = xxxx order by file_mtime desc limit xxx,xxx;


xxxid mtime has an index

| username: 小龙虾爱大龙虾 | Original post link

*: update tikv/client-go to fix batch-client send loop panic issue by crazycs520 · Pull Request #47716 · pingcap/tidb · GitHub Doesn’t seem to have any impact.

| username: chnage | Original post link

The TiDB panic doesn’t have much impact, but it triggers an alert. Compared to version v4.0.16, SQL execution is interrupted, which can affect business operations.

| username: 小龙虾爱大龙虾 | Original post link

Will this cause an SQL error?

| username: chnage | Original post link

Currently, it is uncertain whether the panic was caused by the maximum SQL execution time interruption, but the business side has reported that the business SQL execution was interrupted.

| username: chnage | Original post link

Additionally, the memory usage has increased.

| username: 哈喽沃德 | Original post link

Have all the statistics become invalid?

| username: tidb菜鸟一只 | Original post link

Is it fast to directly select fileid, userid, chkcode, ctime, sid, ascii_sid, permission, source, file_mtime, ext_perm from xxxx where xxxid = xxxx?

| username: oceanzhang | Original post link

This definitely affects the business. I feel it’s a limit pagination issue.