Memory Control Not Effective in tidb-server 5.3.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-server 5.3.0 内存控制不生效

| username: liujia

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.3.0
[Encountered Problem] Running a very slow SQL on tidb-server (joining a table with 80 million rows and another with 100 million rows) causes OOM. Memory control settings such as mem-quota-query and oom-action are not effective.

Configuration is as follows:
server_configs:
tidb:
log.slow-threshold: 300
mem-quota-query: 8589934592
oom-action: cancel
oom-use-tmp-storage: true
performance.server-memory-quota: 30212254720
performance.txn-total-size-limit: 2173741824
tmp-storage-path: /home/shared/tmp-storage

tidb-server log:
[2022/09/28 16:31:38.100 +08:00] [WARN] [memory_usage_alarm.go:140] [“tidb-server has the risk of OOM. Running SQLs and heap profile will be recorded in record path”] [“is server-memory-quota set”=true] [server-memory-quota=30212254720] [“tidb-server memory usage”=24235258432] [memory-usage-alarm-ratio=0.8] [“record path”=“/home/shared/tmp-storage/1004_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage/record”]
[2022/09/28 16:31:46.500 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.07822628s] [cop_time=12.266157414s] [process_time=287.003s] [wait_time=0.973s] [request_count=374] [total_keys=347676927] [process_keys=174355378] [num_cop_tasks=374] [process_avg_time=0.7673877s] [process_p90_time=1.313s] [process_max_time=2.1390000000000002s] [process_max_addr=dx-op-tidb-tikv3-pm:20160] [wait_avg_time=0.002601604s] [wait_p90_time=0.003s] [wait_max_time=0.095s] [wait_max_addr=dx-op-tidb-tikv3-pm:20160] [stats=ods_bort_trade_t_order_item_part:436299555451174938,ods_bort_trade_t_order_part:436300359591002118] [conn_id=3] [user=dbadmin_01] [database=pipe_data_realtime] [table_ids=“[18951,78473,18951,78473]”] [txn_start_ts=436300374716710913] [mem_max=“10375289917 Bytes (9.66 GB)”] [sql=“select 0 as id, scencekey ,10001 as resourceType, productid as resourceId, bizid as bizId, sum(quantity) as soldCount, CURRENT_timestamp() as dbctime, CURRENT_timestamp() as dbutime from (select id,scencekey from pipe_data_realtime.ods_bort_trade_t_order_part where state <> ‘4’) a join (select orderitemid, orderid, productid, variantid, quantity, bizid, refundstate, transferstate from pipe_data_realtime.ods_bort_trade_t_order_item_part where refundstate <> 2 and transferstate <> 1 )b on a.id = b.orderid group by scencekey, bizid , productid union all select 0 as id, scencekey ,10002 as resourceType, variantid as resourceId, bizid as bizId, sum(quantity) as soldCount , CURRENT_timestamp() as dbctime, CURRENT_timestamp() as dbutime from(select id,scencekey from pipe_data_realtime.ods_bort_trade_t_order_part where state <> ‘4’) a join (select orderitemid, orderid, productid, variantid, quantity, bizid, refundstate, transferstate from pipe_data_realtime.ods_bort_trade_t_order_item_part where refundstate <> 2 and transferstate <> 1 ) b on a.id = b.orderid group by scencekey, bizid , variantid”]
[2022/09/28 16:31:46.931 +08:00] [INFO] [coprocessor.go:1149] [“memory exceeds quota, destroy one token now.”] [consumed=10378389257] [quota=8589934592] [“total token count”=80] [“remaining token count”=37]

[Expected Outcome]
tidb-server should kill SQL queries that exceed memory limits according to the configuration, instead of causing OOM.

| username: wuxiangdong | Original post link

This indicates that the configuration has taken effect.

| username: liujia | Original post link

However, the SQL was not killed.
After checking the log, it was just throttled.

Flow Control

  • TiDB supports dynamic memory control for data reading operators. By default, the data reading operator uses the maximum number of threads allowed by tidb_distsql_scan_concurrency to read data. When the memory usage of a single SQL statement exceeds tidb_mem_quota_query, the data reading operator will stop one thread.
  • The flow control behavior is controlled by the parameter tidb_enable_rate_limit_action.
  • When flow control is triggered, a log containing the keyword memory exceeds quota, destroy one token now will be printed in the log.
| username: 特雷西-迈克-格雷迪 | Original post link

You can try using TiFlash for acceleration; also, consider if there are conditions for filtering. The main focus should be on the result set, not the size of your table.

| username: liujia | Original post link

The result set is very large, which will cause the tidb-server memory to be exhausted. So I want to kill such queries to prevent tidb-server from OOM. However, memory control is not effective.

| username: 人如其名 | Original post link

You didn’t post the execution plan. Actually, I suspect that the memory issue is mainly due to hash aggregation. Please set the session-level settings:
set tidb_hashagg_final_concurrency=1;
set tidb_hashagg_partial_concurrency=1;
This will control the aggregation.

| username: 特雷西-迈克-格雷迪 | Original post link

It should be possible. Check if there is a corresponding bug, otherwise try upgrading the version.