Suggestions for Handling Out of Memory Errors When Bulk Inserting Data in a Single SQL Statement

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 大批量插入数据报单条sql内存不足错误的建议

| username: wenyi

[TiDB Usage Environment] Testing
[TiDB Version] 7.1.1

When copying a table in TiDB, i.e., inserting a large batch of data into a new table, an error occurs indicating insufficient memory for a single SQL statement. My suggestions are as follows:

  1. Why doesn’t TiDB learn from MySQL or Oracle and use temporary files? Does everything have to be done in memory? Reporting such errors is not optimized.
  2. Is there a parameter control? For example, if a single SQL statement uses a certain amount of memory, it writes to a temporary file. This would improve user experience. MySQL, Oracle, etc., seem to handle this well. I suggest TiDB learn from them.
| username: 像风一样的男子 | Original post link

You can try using batch statements to split DML statements

| username: Kongdom | Original post link

This parameter controls

| username: zhanggame1 | Original post link

I support this. TiDB should indeed use temporary files here. Batch processing is not very effective. It would be best to also support “CREATE TABLE AS SELECT”. With Oracle, this amount of data would not cause an OOM.

| username: TiDBer_oHSwKxOH | Original post link

Distributed and monolithic databases are different. TiDB is equivalent to a collector that collects data and then outputs it. Take your time. If it’s an optimization suggestion, you can give it to your cousin.

| username: wenyi | Original post link

This parameter only increases the memory used by a single SQL, it does not mean that a single SQL will start writing temporary files after using this much memory.

| username: Kongdom | Original post link

This is not writing temporary files.

| username: 裤衩儿飞上天 | Original post link

Sorry, I can’t translate images. Please provide the text you need translated.

| username: wenyi | Original post link

Connect to that TiDB node, temporarily store it on that node, delete the temporary file once the statement is completed, or use it in a cyclic manner similar to Oracle’s undo tablespace.

| username: Kongdom | Original post link

You can make suggestions in this section. After all, temporary tables have been implemented, so there should be a better solution here~

| username: redgame | Original post link

Use tidb_enable_tmp_storage_on_oom.

| username: tony5413 | Original post link

Support, it reports this kind of error with just over 6 million records.

| username: tidb菜鸟一只 | Original post link

  1. Why doesn’t TiDB follow MySQL or Oracle and use temporary files, insisting on completing everything in memory? Reporting such errors is not optimized.
    Temporary files also have drawbacks: first, they are inefficient, and second, recovery is slower if an exception occurs. Using temporary files can avoid errors, but many people cannot accept the performance. Of course, TiDB also supports using temporary files, which requires you to adjust parameters to control it.

  2. Are there parameters to control this? For example, writing to temporary files when a single SQL query uses a certain amount of memory, which improves user experience. MySQL, Oracle, etc., seem to handle this well, and it is suggested that TiDB learn from them.
    tidb_mem_quota_query is used to set the maximum memory usage for a SQL query, and tidb_enable_tmp_storage_on_oom is used to set whether to use temporary files when memory is exceeded.

| username: TiDBer_vfJBUcxl | Original post link

Add conditions to the query statement of the t_dch_patient_card table, allowing the t_dch_patient_card table to be imported multiple times. At the same time, use the system variable tidb_mem_quota_query to configure the memory usage threshold during the execution of a SQL statement.
https://docs.pingcap.com/zh/tidb/stable/configure-memory-usage#如何配置一条-sql-执行过程中的内存使用阈值

| username: zhanggame1 | Original post link

MySQL or Oracle have temporary tablespaces and undo, TiDB can consider this.

| username: longzhuquan | Original post link

The underlying storage logic is different: one is BTREE for a single machine, and the other is LSM for distributed systems. Consider what the underlying layer is suitable for before discussing the upper layer redo and undo. You can’t have your cake and eat it too.

| username: tidb菜鸟一只 | Original post link

For temporary tablespace, it essentially means creating a disk file on the disk to perform memory tasks. The advantage is that it is less likely to report errors, but the downside is that the efficiency decreases. TiDB now also supports this mode.

As for undo, both MySQL and Oracle place the modified data into a separately allocated undo disk file, and other sessions directly query this. TiDB, on the other hand, uses MVCC (Multi-Version Concurrency Control) to handle this. The modified data is placed in the data file, and other sessions can directly query it.

| username: cassblanca | Original post link

The tidb_enable_tmp_storage_on_oom parameter only supports the use of temporary disk for certain operators, not all operations are supported.