[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4.2
Question: A large SQL query in TiDB can exhaust the system memory and cause an OOM. Is there a parameter to control this and prevent OOM?
First Scenario
Without using a hint, running the SQL directly. Since the SQL exceeds the set mem-quota-query=500M limit, it directly prompts “Out Of Memory Quota!” indicating that the mem-quota-query parameter is effective, which is a memory protection mechanism of TiDB, no problem here.
[SQL]SELECT
wgc.*, wu.nickname AS 'sendUserNickName', wu.channel_id as register_channel_id,
b.charm_score * wgc.gift_amount as charm_score from weplay_gift_consume wgc
LEFT JOIN weplay_user wu ON wgc.uid = wu.uid
LEFT JOIN weplay_gift_config b ON wgc.gift_id = b.id
ORDER BY wgc.create_time DESC
[Err] 1105 - Out Of Memory Quota![conn_id=5]
Second Scenario
Using a hint, the SQL keeps running until the system OOMs. Is there a parameter to control this to prevent the SQL from occupying all system memory and causing an OOM?
SELECT
/*+ MERGE_JOIN(wu) */
wgc.*, wu.nickname AS 'sendUserNickName', wu.channel_id as register_channel_id,
b.charm_score * wgc.gift_amount as charm_score from weplay_gift_consume wgc
LEFT JOIN weplay_user wu ON wgc.uid = wu.uid
LEFT JOIN weplay_gift_config b ON wgc.gift_id = b.id
ORDER BY wgc.create_time DESC
set tidb_enable_rate_limit_action = 0;
set tidb_distsql_scan_concurrency = 1;
The system memory is not increasing as fast as before, but it is still slowly rising. However, after running for nearly 10 minutes, the memory is already full and eventually OOM.
It seems that TiDB does not yet have a “disk for time” mechanism like MySQL. Although MySQL is slow, it can always complete the task. TiDB either prompts “out of memory quota” or eventually causes the system to OOM. Large SQL queries are unmanageable???
[2022/11/10 11:17:10.053 +08:00] [INFO] [row_container.go:368] ["memory exceeds quota, spill to disk now."] [consumed=524678879] [quota=524288000]
[2022/11/10 11:17:10.053 +08:00] [INFO] [row_container.go:562] ["memory exceeds quota, spill to disk now."] [consumed=524638922] [quota=524288000]
[2022/11/10 11:17:10.058 +08:00] [INFO] [row_container.go:368] ["memory exceeds quota, spill to disk now."] [consumed=525997030] [quota=524288000]
[2022/11/10 11:17:10.669 +08:00] [INFO] [row_container.go:368] ["memory exceeds quota, spill to disk now."] [consumed=524618199] [quota=524288000]
[2022/11/10 11:17:10.677 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=15.470587844s] [cop_time=44.804289864s] [process_time=145.775s] [wait_time=5.234s] [request_count=21621] [total_keys=7242446] [process_keys=6781539] [num_cop_tasks=21621] [process_avg_time=0.006742287s] [process_p90_time=0.017s] [process_max_time=0.887s] [process_max_addr=10.0.8.39:20160] [wait_avg_time=0.000242079s] [wait_p90_time=0.001s] [wait_max_time=0.091s] [wait_max_addr=10.0.8.39:20161] [stats=weplay_user:437269348093526017,weplay_gift_config:437246981652611087,weplay_gift_consume:437268436107395086] [conn_id=29] [user=root] [database=weplayzy] [table_ids="[1008,308,1012]"] [index_names="[weplay_gift_consume:idx_uid]"] [txn_start_ts=437269355603689479] [mem_max="658532008 Bytes (628.0 MB)"] [sql="SELECT\r\n /*+ MERGE_JOIN(wu) */\r\n wgc.*,wu.nickname AS 'sendUserNickName',wu.channel_id as register_channel_id,\r\n b.charm_score*wgc.gift_amount as charm_score from weplay_gift_consume wgc\r\n LEFT JOIN weplay_user wu ON wgc.uid = wu.uid\r\n LEFT JOIN weplay_gift_config b ON wgc.gift_id = b.id\r\n ORDER BY wgc.create_time DESC"]
What business scenario requires querying over 100 million pieces of data each time? It seems optimization should be approached from the business level.
Yes, in practical applications, time field filtering is controlled, and data within a week can be queried within 2 million rows. (You need to add a hint (/*+ MERGE_JOIN(wu) */), without the hint it prompts out of memory quota)
Setting aside the application scenario. For large SQL queries, TiDB’s choices are either out of memory quota or system OOM, seemingly without an efficient disk swap mechanism (although it does spill data to disk, it still consumes memory and eventually OOMs). What I want to achieve is for large SQL queries to be able to run on TiDB.
What disk is your /tmp on?
During the process of operators writing to disk, you need to consider temp I/O issues. If the write speed is not fast enough, it will still accumulate in memory, leading to OOM (Out of Memory).
How much total memory does TiDB have, how much tmp is there, and how large is 100 million+ data? If all conditions are met, it should be able to run to completion. If one condition is not met, OOM is normal.
tmp has been adjusted to the SSD data disk. The space of 400G is sufficient.
After running for more than 10 minutes, the system encountered an OOM (Out of Memory) error.