TiDB Node Memory Not Released Causing Node Restart

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb节点内存不释放导致节点重启

| username: TiDBer_Y2d2kiJh

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.4.0 2tidb 2pd 3tikv
【Reproduction Path】The memory of the tidb node keeps increasing without being released. When the memory is fully occupied, it causes the node to restart, as shown in the red box.
【Encountered Problem: Problem Phenomenon and Impact】
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】

| username: zhanggame1 | Original post link

Try adjusting tidb_mem_quota_query and then check which specific SQL is consuming the memory.

| username: xingzhenxiang | Original post link

It is normal for the TiDB server to run out of memory. Monitor slow SQL more frequently and arrange read-write separation reasonably. It is generally caused by reads. After read-write separation, manually intervene in advance when memory exhaustion occurs.

| username: AeolusX | Original post link

You need to check if this memory growth has been continuously increasing since you restarted TiDB. If it is continuously increasing, it might be caused by a bug in this version. Try setting tidb_analyze_version to 1.

| username: xfworld | Original post link

The memory of TiDB nodes is released according to the GC cycle scheduling.

Reference documents, describing principles and configurations:

However, releasing memory that is in use is definitely unreasonable. The memory that can be released must be the memory that is no longer in use.
So, if a query occupies memory for a long time, what problems will it cause? It will not be able to be released.

For slow queries, long-term optimization is needed to reduce memory usage. Additionally, versions 6.1.X and 6.5.X have made significant optimizations for memory release (the occurrence of OOM has been greatly reduced). It is recommended to upgrade after POC.

| username: TiDBer_Y2d2kiJh | Original post link

The tidb_analyze_version=1 has already been set, but after observing for a period of time, the memory still increases and is not released.

| username: Billmay表妹 | Original post link

Consider upgrading.

| username: TiDBer_Y2d2kiJh | Original post link

Which version should I upgrade to? Will this issue not occur in higher versions?

| username: 像风一样的男子 | Original post link

At least 3 PDs are required in a production environment.

| username: TiDBer_Y2d2kiJh | Original post link

May I ask what this is determined by?

| username: 像风一样的男子 | Original post link

The leader of PD is elected. If one of your two nodes goes down, it won’t be able to hold an election. The effect of having two nodes is the same as having a single node.

| username: xingzhenxiang | Original post link

A script to directly kill SQL queries that exceed memory or time limits is more effective than setting parameters.

for list in ` /server/mysql5.7/bin/mysql -hXX.XX.XX.XX -p'password'  -vvv -e " select id  from  INFORMATION_SCHEMA.processlist a where a.info is not null and (mem >=11474836480 or time >600);"  |grep -Ev 'id|ID|iD|Id' |awk -F "|" '{print $2}'`
echo $list
 /server/mysql5.7/bin/mysql -hXX.XX.XX.XX -p'password'  -vvv -e "select id,time,info,mem  from  INFORMATION_SCHEMA.processlist a where id=$list and a.info is not null;;" > /sh/killlog/`date +%s`.log
 /server/mysql5.7/bin/mysql -hXX.XX.XX.XX -p'password'  -vvv -e " kill tidb $list ;"
| username: TiDBer_Y2d2kiJh | Original post link

I used your query, but there were no matching results. There are many SQL statements in the sleep state. I set the idle connection to disconnect after 5 minutes, and the number of connections decreased, but the memory was still not released.

| username: xingzhenxiang | Original post link

I have set it to 10G, but you can modify the value according to your needs to meet your requirements.