When copying a table in TiDB, i.e., inserting a large batch of data into a new table, an error occurs indicating insufficient memory for a single SQL statement. My suggestions are as follows:
Why doesn’t TiDB learn from MySQL or Oracle and use temporary files? Does everything have to be done in memory? Reporting such errors is not optimized.
Is there a parameter control? For example, if a single SQL statement uses a certain amount of memory, it writes to a temporary file. This would improve user experience. MySQL, Oracle, etc., seem to handle this well. I suggest TiDB learn from them.
I support this. TiDB should indeed use temporary files here. Batch processing is not very effective. It would be best to also support “CREATE TABLE AS SELECT”. With Oracle, this amount of data would not cause an OOM.
Distributed and monolithic databases are different. TiDB is equivalent to a collector that collects data and then outputs it. Take your time. If it’s an optimization suggestion, you can give it to your cousin.
This parameter only increases the memory used by a single SQL, it does not mean that a single SQL will start writing temporary files after using this much memory.
Connect to that TiDB node, temporarily store it on that node, delete the temporary file once the statement is completed, or use it in a cyclic manner similar to Oracle’s undo tablespace.
Why doesn’t TiDB follow MySQL or Oracle and use temporary files, insisting on completing everything in memory? Reporting such errors is not optimized.
Temporary files also have drawbacks: first, they are inefficient, and second, recovery is slower if an exception occurs. Using temporary files can avoid errors, but many people cannot accept the performance. Of course, TiDB also supports using temporary files, which requires you to adjust parameters to control it.
Are there parameters to control this? For example, writing to temporary files when a single SQL query uses a certain amount of memory, which improves user experience. MySQL, Oracle, etc., seem to handle this well, and it is suggested that TiDB learn from them. tidb_mem_quota_query is used to set the maximum memory usage for a SQL query, and tidb_enable_tmp_storage_on_oom is used to set whether to use temporary files when memory is exceeded.
Add conditions to the query statement of the t_dch_patient_card table, allowing the t_dch_patient_card table to be imported multiple times. At the same time, use the system variable tidb_mem_quota_query to configure the memory usage threshold during the execution of a SQL statement. https://docs.pingcap.com/zh/tidb/stable/configure-memory-usage#如何配置一条-sql-执行过程中的内存使用阈值
The underlying storage logic is different: one is BTREE for a single machine, and the other is LSM for distributed systems. Consider what the underlying layer is suitable for before discussing the upper layer redo and undo. You can’t have your cake and eat it too.
For temporary tablespace, it essentially means creating a disk file on the disk to perform memory tasks. The advantage is that it is less likely to report errors, but the downside is that the efficiency decreases. TiDB now also supports this mode.
As for undo, both MySQL and Oracle place the modified data into a separately allocated undo disk file, and other sessions directly query this. TiDB, on the other hand, uses MVCC (Multi-Version Concurrency Control) to handle this. The modified data is placed in the data file, and other sessions can directly query it.