Tidb_server and PD frequently at 99.9% memory usage

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb_server、pd经常内存99.9%

| username: 等一分钟

Is there a solution for tidb_server and pd installed on the same machine, frequently causing memory usage to exceed 99.5% and leading to the machine freezing?

| username: 等一分钟 | Original post link

I think the problem is that the tidb-server process is not running. You can check if the process is running with the following command:

ps -ef | grep tidb-server
| username: h5n1 | Original post link

What is the machine’s memory size and mem_quota_query setting? There is a resource_control parameter that can set the memory size.

| username: 等一分钟 | Original post link

The machine has 32GB of memory, and mem_quota_query is not set.

| username: 等一分钟 | Original post link

Is it possible to find out which SQL statements are consuming a lot of memory?

| username: h5n1 | Original post link

You can use pprofile to check TiDB’s memory usage, and you need to install Go.

curl -G 10.161.67.82:10080/debug/pprof/heap > db.heap.prof

Then use the Go tool pprof:

go tool pprof db.heap.prof

and use the top command to check.

| username: 等一分钟 | Original post link

If the TiDB cluster is installed using tiup, to modify mem_quota_query, should I use the tiup-edit command? Does the cluster need to be restarted?

| username: h5n1 | Original post link

It seems like you need to reload, just reload with -R tidb.

| username: 等一分钟 | Original post link

After setting the mem_quota_query parameter, will SQL queries that exceed this limit directly report an error?

| username: 等一分钟 | Original post link

[root@prod-sh1-tidb-pd3-0002 ~]# go tool pprof db.heap.prof
File: tidb-server
Build ID: f802f4591b52719a3d3659c256db53dde27d2c61
Type: inuse_space
Time: Aug 2, 2022 at 5:07pm (CST)
Entering interactive mode (type “help” for commands, “o” for options)
(pprof) top
Showing nodes accounting for 1938.98MB, 75.05% of 2583.60MB total
Dropped 556 nodes (cum <= 12.92MB)
Showing top 10 nodes out of 129
flat flat% sum% cum cum%
861.70MB 33.35% 33.35% 861.70MB 33.35% github.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleCopResponse
264.80MB 10.25% 43.60% 264.80MB 10.25% google.golang.org/grpc.(*parser).recvMsg
193.04MB 7.47% 51.07% 193.04MB 7.47% github.com/pingcap/tidb/store/copr.coprCacheBuildKey
163.21MB 6.32% 57.39% 163.21MB 6.32% github.com/pingcap/tidb/statistics.NewCMSketch
113.03MB 4.38% 61.77% 113.03MB 4.38% reflect.New
100.97MB 3.91% 65.67% 130.97MB 5.07% github.com/pingcap/tidb/executor.(*baseHashAggWorker).getPartialResult
86.23MB 3.34% 69.01% 98.78MB 3.82% github.com/pingcap/tidb/util/chunk.(*Column).AppendBytes
66.09MB 2.56% 71.57% 66.09MB 2.56% bytes.makeSlice
45.67MB 1.77% 73.34% 45.67MB 1.77% github.com/pingcap/tidb/util/chunk.newVarLenColumn
44.23MB 1.71% 75.05% 44.23MB 1.71% github.com/pingcap/tidb/kv.(*HandleMap).Set
(pprof)

| username: 等一分钟 | Original post link

Can you tell anything from this?

| username: 等一分钟 | Original post link

[root@prod-sh1-tidb-pd3-0002 ~]# free -m
total used free shared buff/cache available
Mem: 32009 30489 669 4 850 1165
Swap: 31999 3846 28153

| username: h5n1 | Original post link

You can use Go and Graphviz to check the profile. There is a relatively detailed page, and you can install it on Windows. Generally, if the memory used by SQL execution exceeds mem_quota, it will be killed.

| username: 等一分钟 | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: 等一分钟 | Original post link

Do you recommend setting these parameters?

| username: h5n1 | Original post link

First, try adjusting tidb_mem_quota_query.

| username: 等一分钟 | Original post link

The default value of tidb_mem_quota_query is 1G. Does this mean that if it exceeds 1G, the SQL will be killed?

| username: 等一分钟 | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: 等一分钟 | Original post link

This is the memory size occupied when this SQL is executed, but it seems that there is no kill补.

| username: 等一分钟 | Original post link

Can the server-memory-quota parameter be configured in tiup cluster edit-config?