TiDB Memory Overflow Loop Restart

translator_bot · June 23, 2024, 7:23am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB内存溢出循环重启

| username: TiDBer_uXv8htBz

Frequent out-of-memory restarts, with an interval of about 3 minutes. I really can’t find a solution. Can any expert provide some guidance?

translator_bot · June 23, 2024, 7:23am

| username: h5n1 | Original post link

Did you enable SPM automatic evolution? Check the setting of the tidb_evolve_plan_baselines variable.

translator_bot · June 23, 2024, 7:23am

| username: TiDBer_uXv8htBz | Original post link

This variable is turned off.

translator_bot · June 23, 2024, 7:23am

| username: TiDBer_uXv8htBz | Original post link

Here is all the configuration information. Please, experts, give some guidance. Much appreciated

translator_bot · June 23, 2024, 7:23am

| username: Billmay表妹 | Original post link

What version?
Steps to reproduce the issue?

translator_bot · June 23, 2024, 7:23am

| username: TiDBer_uXv8htBz | Original post link

Version v4.0.0.
Steps

Error log information:
goroutine 479 [select]:
github.com/pingcap/tidb/domain.(*Domain.handleEvolvePlanTasksLoop.func1(0xc0003d3440, 0x3707560, 0xc00270c200)
/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/domain/domain.go:913 +0x1c1
created by github.com/pingcap/tidb/domain.(*Domain.handleEvolvePlanTasksLoop
/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/domain/domain.go:908 +0x73

goroutine 464 [select, 3 minutes]:
go.etcd.io/etcd/clientv3.(*lessor.keepAliveCtxCloser({“level”:“warn”,“ts”:“2022-08-09T11:41:47.308+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-e1f52210-d6f8-495d-a667-b2c01ae462e3/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = context deadline exceeded”}
{“level”:“warn”,“ts”:“2022-08-09T11:41:48.196+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-48902fbb-0317-4aa5-815b-a8c309ad3ed1/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = Unavailable desc = transport is closing”}
{“level”:“warn”,“ts”:“2022-08-09T11:41:48.312+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-e1f52210-d6f8-495d-a667-b2c01ae462e3/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = Unavailable desc = transport is closing”}
{“level”:“warn”,“ts”:“2022-08-09T11:57:42.942+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-b7122453-edc5-401e-9c15-f3b19a6957d0/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = context deadline exceeded”}
{“level”:“warn”,“ts”:“2022-08-09T11:57:50.062+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-b318aef7-2b8e-4d96-8c70-01cc8176f889/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = context deadline exceeded”}
{“level”:“warn”,“ts”:“2022-08-09T11:57:51.967+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-b318aef7-2b8e-4d96-8c70-01cc8176f889/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = Unavailable desc = transport is closing”}
{“level”:“warn”,“ts”:“2022-08-09T12:01:54.049+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-804d5f0a-1041-4a8f-b792-604a99da1e6e/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = context deadline exceeded”}
{“level”:“warn”,“ts”:“2022-08-09T12:01:54.934+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-68b7b320-1768-45ce-ae37-2607d87a9a83/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = Unavailable desc = transport is closing”}
{“level”:“warn”,“ts”:“2022-08-09T12:01:55.055+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-804d5f0a-1041-4a8f-b792-604a99da1e6e/192.168.0.41:2379”,“attempt”:0,“error”:“rpc error: code = Unavailable desc = transport is closing”}
fatal error: runtime: out of memory

runtime stack:
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x318b551, 0x16)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.sysMap(0xc720000000, 0x4000000, 0x51aae98)
/usr/local/go/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x5190d60, 0x30000, 0xffffffff010dd5c9, 0x29dbff6)
/usr/local/go/src/runtime/malloc.go:701 +0x1cd
runtime.(*mheap).grow(0x5190d60, 0x18, 0xffffffff)
/usr/local/go/src/runtime/mheap.go:1252 +0x42
runtime.(*mheap).allocSpanLocked(0x5190d60, 0x18, 0x51aaea8, 0x4d15320)
/usr/local/go/src/runtime/mheap.go:1163 +0x291
runtime.(*mheap).alloc_m(0x5190d60, 0x18, 0x101, 0x0)
/usr/local/go/src/runtime/mheap.go:1015 +0xc2
runtime.(*mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:1086 +0x4c
runtime.(*mheap).alloc(0x5190d60, 0x18, 0x7fdba5000101, 0x112b580)
/usr/local/go/src/runtime/mheap.go:1085 +0x8a
runtime.largeAlloc(0x30000, 0x1150100, 0xc71ffd4000)
/usr/local/go/src/runtime/malloc.go:1138 +0x97
runtime.mallocgc.func1()
/usr/local/go/src/runtime/malloc.go:1033 +0x46
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146

High memory usage will cause tidb-server to automatically restart (randomly on three machines),
Server configuration:
pd_servers, tidb_servers: 16 cores, 32G RAM, 500G SSD (3 machines)
tikv_servers: 16 cores, 32G RAM, 1T SSD (3 machines)

translator_bot · June 23, 2024, 7:23am

| username: Billmay表妹 | Original post link

Please refer to the following posts:

TiDB 的问答社区 – 27 Apr 22

执行复杂查询导致内存溢出OOM

🪐 TiDB 技术问题

【 TiDB 使用环境`】生产环境【 TiDB 版本】v5.4 【遇到的问题】OOM 【问题现象及影响】执行复杂查询内存不断增长直至内存溢出宕机重启，内存迅速回升，连接断开组件的信息都是v5.4 topology.txt (5.7 KB) tiup-tidb-Overview_2022-04-27T05_14_49.481Z.json (1.0 MB) 不知道是不是这个日志，如果不对的话各位请指点一下，需要什么信息，然后我再上传感谢

阅读时间: 2 mins 🕑 赞: 21 ❤

Since you are using an older version, please upgrade to the latest version if possible.

translator_bot · June 23, 2024, 7:23am

| username: XuHuaiyu-PingCAP | Original post link

Is the posted error log complete?

translator_bot · June 23, 2024, 7:23am

| username: TiDBer_uXv8htBz | Original post link

tidb_stderr.log (10.0 MB)

translator_bot · June 23, 2024, 7:23am

| username: cheng | Original post link

Common causes of OOM:

Large data volume or high concurrency leading to excessive memory usage.
Check if there are any slow SQL queries with large data volumes before the OOM occurs. You can look at the slow SQL in the dashboard and check STATEMENTS_SUMMARY and STATEMENTS_SUMMARY_HISTORY to see which queries are consuming the most memory.
If analyze_version=2, adjust the parameter value and follow the steps in the link below to delete the existing statistics with version=2:
常规统计信息 | PingCAP 文档中心
Other bugs.

translator_bot · June 23, 2024, 7:23am

| username: XuHuaiyu-PingCAP | Original post link

Try changing this configuration item to 0?

Change all tidb-servers to 0

[performance]
feedback-probability = 0.0

translator_bot · June 23, 2024, 7:23am

| username: TiDBer_uXv8htBz | Original post link

The configuration has been added, but the frequent restart issue still occurs.

translator_bot · June 23, 2024, 7:23am

| username: jansu-dev | Original post link

Please capture the DEBUG package before the restart using the following method: http://{TiDBIP}:10080/debug/zip?seconds=60
Send the memory usage trend of the process in the server under the tidb group before and after the OOM.