Is there a quick way or parameter to reclaim memory for tidb-server when the memory consumption of the tidb-server service is nearly exhausted?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 在tidb-server服务出现内存快要消耗怠尽时,对tidb-server有没有快速回收内存的办法或参数?

| username: vcdog

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact]
When the tidb-server service is about to run out of memory, is there a quick way or parameter to reclaim memory for tidb-server?


[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: maokl | Original post link

Same question!

| username: vcdog | Original post link

The R&D personnel reported that the total memory of the tidb-server server is 94.3 GiB. The parameters are as follows: tidb_server_memory_limit is configured to 64G, and tidb_server_memory_limit_gc_trigger is set to 0.9, so GC should be triggered at 64*0.9=57.6G. Why did it run up to over 70G, eventually causing an OOM and instance restart?

| username: dba-kit | Original post link

The memory statistics of TiDB-Server are not very accurate, and some scenarios may not be accounted for. It is best to use Cgroup to limit it from the operating system. However, once it is limited, TiDB-Server may trigger the operating system’s OOM Kill, causing the service to restart.

| username: dba-kit | Original post link

Additionally, when TiDB terminates SQL, it processes them one by one. If the processing speed cannot keep up with the rate of increase, it may still exceed the limit. Similarly, GC also has a speed limit, as stated in the documentation: This GC method will trigger at most once per minute.

| username: dba-kit | Original post link

You can refer to this document for details: TiDB 内存控制文档 | PingCAP 归档文档站

| username: TiDBer_oHSwKxOH | Original post link

There is no way to reclaim memory, we can only limit large memory SQL.

| username: tidb菜鸟一只 | Original post link

How much is the tidb_mem_quota_query parameter set to?

| username: 有猫万事足 | Original post link

This parameter GOMEMLIMIT is worth trying.

Use all available means.

| username: 像风一样的男子 | Original post link

The key to solving the problem is to optimize the SQL statements that consume a lot of resources. If optimization is not possible on the business side, you can consider using TiFlash. Ultimately, if money can solve the problem, it’s not really a problem. Quickly scale up TiDB.

| username: vcdog | Original post link

tidb_mem_quota_query

| username: tidb菜鸟一只 | Original post link

A single SQL is 15G? If you run a few concurrent large SQLs, it will directly bring TiDB to a halt. I think you should first look into optimizing those large SQLs.

| username: zhanggame1 | Original post link

There is no good solution; the only options for parameter adjustment are to kill the SQL or the TiDB server itself. The fundamental solution is to address the SQL memory usage.

| username: 大飞哥online | Original post link

This is really intense, haha. Let’s tone it down a bit.

| username: DBRE | Original post link

I hope there will be a parameter to reclaim memory. The key is that setting tidb_mem_quota_query is useless. Even when there are no SQL executions and no connections, this memory is not released. :sob: :joy: :sweat_smile:

| username: Fly-bird | Original post link

Set the maximum memory for a query.

| username: Soysauce520 | Original post link

Is it 6.5? The 6.5 limit is better.

| username: dba-kit | Original post link

TiDB should automatically recycle. Are you sure all connections have been released? Execute the following SQL to check if there are any remaining large transactions that haven’t been released in the corresponding INSTANCE:

select INSTANCE, START_TIME, SESSION_ID, USER, STATE from INFORMATION_SCHEMA.CLUSTER_TIDB_TRX order by START_TIME;
| username: dba-kit | Original post link

Based on version 6.5, there is no thorough solution to prevent a continuous influx of slow SQLs. The only approach is to analyze specific cases through top-sql to identify which SQLs caused the OOM at that time and have the business optimize them.

A complete solution can be expected with the automatic rate limiting of resource control. However, the resource control features in version 7.1 are limited and do not support this. Version 7.2’s resource control can address this issue, but it is not an LTS version. We will have to wait until the end of the year for the 7.5 LTS version to be released. In the meantime, you can check out the resource control features in version 7.3, which already allows for automatic slowdown/blocking of the same type of slow queries.

| username: DBRE | Original post link

The instance in the screenshot above has been restarted and replaced with a new tidb-server, version v5.2.2.