Oom-use-tmp-storage: Which operators support the use of temporary disk space

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: oom-use-tmp-storage:哪些算子支持使用临时磁盘空间

| username: h5n1

oom-use-tmp-storage is said to support some operations using temporary disk storage when mem_quota_query is exceeded. So, which specific operators or operations can use temporary disk space to avoid OOM, and from which version does this feature start?

| username: xfworld | Original post link

Let’s wait for the experts to answer. I remember it was mentioned during a previous sharing session:
Even if temporary disks are used for data caching, some data will still remain in memory. It will help, but to a limited extent.

| username: Billmay表妹 | Original post link

Attend the moderator exchange meeting.

| username: tiancaiamao | Original post link

I grepped the code and found that the following operators handle spilling to disk:

  • sort
  • hashjoin
  • hashagg
  • cte
  • mergeJoin
| username: tiancaiamao | Original post link

Specific version… Let’s assume after v6.2.
We have recently done a lot of work to avoid OOM issues (ongoing), and you will feel these changes in the upcoming releases.

As for the oom-use-tmp-storage option, it doesn’t mean that enabling it will prevent OOM.
For example, if the memory is close to the OOM threshold and then it starts to spill to disk, the speed of spilling to disk cannot keep up with the speed of memory allocation, and it still results in OOM.

In terms of the completeness of internal implementation, support for tidb_mem_quota_query is better.
oom-use-tmp-storage only supports a part of the operators. The expectation is that tidb_mem_quota_query must intercept OOM. If it can spill to disk, it will; if it cannot, it should handle it by killing the query or similar measures to protect the TiDB process from being OOM killed.

| username: h5n1 | Original post link

How to determine if this parameter is effective, or if temporary storage is indeed used during SQL execution?

| username: tiancaiamao | Original post link

How to determine if this parameter has taken effect, or if temporary space is indeed used during SQL execution?

How about reducing this value and then executing the corresponding query to observe?

After setting the relevant configuration, observe whether temporary files are generated in the directory to confirm if it is being used.

The introduction time of oom-use-tmp-storage and the support time of various operators for spilling to disk might be different, so the documentation does not detail which operators are supported from which version. It is recommended to test with the latest version possible.

| username: 人如其名 | Original post link

Would it be better to place the disk spill behavior under actionMuForSoftLimit, the soft limit? Currently, I see that only the AggSpillDiskAction behavior of the aggregation operator in non-concurrent scenarios uses the soft limit. All other disk spill behaviors are under actionMuForHardLimit, which is at the same level as PanicOnExceed. Therefore, in many scenarios, the operation is canceled before it has a chance to spill to disk. The soft limit’s softScale is fixed at 80% of the hard limit. It would be better if we could place the disk spill under the soft limit and allow users to set this parameter themselves.

| username: Raymond | Original post link

For example, the memory is about to reach the OOM threshold, then it starts to spill to disk, but the speed of spilling to disk can’t keep up with the speed of memory allocation, and it still results in OOM -----> I would like to ask, assuming tidb_mem_quota_query is 1G, at how many MB of memory usage will it start to consider spilling to disk? How is this threshold calculated?

| username: 人如其名 | Original post link

In version 6.3, single-threaded aggregation spills to disk when it reaches 80% of the memory control for the statement. Other types of spilling to disk require reaching 100% of the memory control for the statement.

| username: Raymond | Original post link

So is it possible that while the statement is being written to disk, the memory usage of the statement continues to rise, eventually causing the TiDB server to run out of memory (OOM)?

| username: 人如其名 | Original post link

There are such situations, but usually, they won’t happen because the priority of writing to disk is relatively high. If data is being written to disk and more data comes in, it will wait for the disk write to complete. If the disk write has already completed, it will trigger lower-priority actions like cancel. The main scenario that easily leads to OOM (Out of Memory) is when it is not being tracked (assuming you have set oomaction to cancel). For example, during the matching phase of a hash join, one record matches many records in the right table. Memory tracking does not occur during the matching process. Although this situation is rare (it cannot be ruled out that outdated statistics cause the optimizer to place the large table on the right side), encountering it may lead to OOM.

| username: h5n1 | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.