I thought it’ll be faster than “INSERT IGNORE INTO mytable SELECT”
but looks like wrong.
here mytable is a temporary table created by “CREATE TEMPORARY TABLE” only has one bigint column,
2 questions:
is “import into mytable from select” support temporary table ?
while this should be easy to do the tests by myself, I’d like to get confirmation from tidb official document. I guess not support.
performance issue as the topic
with tidb 8.1,
A. “IMPORT INTO mytable FROM SELECT …”, which mytable is a normal table not temporary table
B. create temporary table tmp_table(id bigint);
INSERT IGNORE INTO tmp_table select …
which is faster between A and B ?
to my surprise, it looks B is faster than A from my quick tests.
The IMPORT INTO statement in TiDB is designed to import data into existing empty tables. However, it does not support importing data into temporary tables created with CREATE TEMPORARY TABLE. The official documentation specifies that IMPORT INTO only supports importing data into existing empty tables.
2. Performance Comparison
A. IMPORT INTO mytable FROM SELECT ... (Normal Table)
The IMPORT INTO statement is optimized for bulk data import and can leverage the distributed execution framework in TiDB to split the import job into multiple sub-jobs, distributing them across different TiDB nodes for execution. This can significantly improve the efficiency of large-scale data imports.
B. INSERT IGNORE INTO tmp_table SELECT ... (Temporary Table)
Temporary tables in TiDB are stored in memory, which can lead to faster data manipulation operations compared to disk-based operations on normal tables. The INSERT IGNORE INTO statement directly inserts data into the temporary table, which might be faster due to the in-memory nature of temporary tables and the reduced overhead of managing disk I/O.
Insights on Performance Difference
The observed performance difference where option B (INSERT IGNORE INTO tmp_table SELECT ...) is faster than option A (IMPORT INTO mytable FROM SELECT ...) can be attributed to several factors:
In-Memory Operations: Temporary tables are stored in memory, leading to faster read/write operations compared to disk-based normal tables.
Overhead of IMPORT INTO: The IMPORT INTO statement involves additional overhead for data validation, sorting, and distributed execution, which might not be as efficient for smaller datasets or simpler operations compared to direct inserts into temporary tables.
Optimization for Bulk Imports: IMPORT INTO is optimized for large-scale data imports and might not show significant performance benefits for smaller or simpler data import tasks.
Conclusion
Based on the provided context and official documentation, IMPORT INTO does not support importing data into temporary tables. For performance, using INSERT IGNORE INTO with temporary tables can be faster due to the in-memory nature of temporary tables and reduced overhead compared to the IMPORT INTO statement.