Enhancing Concurrent Execution Capability of Non-Transactional DML Functions

translator_bot · June 21, 2024, 1:12pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 非事务DML功能增加并发执行能力

| username: h5n1

It is recommended to add concurrency control capabilities to non-transactional DML functions, such as adding a parallel xx option to control the concurrency of executing grouped SQL to speed up execution. Currently, it executes serially after grouping, and manually splitting the range is also quite troublesome, which does not reflect the advantage of this feature.

Additionally, is there anything special about the execution plan for non-transactional DML execution? Previously, when importing a table and then inserting into another table using insert into select, with user_id as the secondary index and grouping every 10,000 rows, it was found that the grouped SQL used a full table scan during actual execution. However, when these SQL statements were executed individually using explain\explain analyze, they could use the index. After collecting statistics, non-transactional DML can use the index normally.

translator_bot · June 21, 2024, 1:12pm

| username: 小龙虾爱大龙虾 | Original post link

IMPORT INTO | PingCAP 归档文档站,

The import into syntax may later support import into xxx select xxx, significantly speeding up processing with the help of the lightning local mode.

translator_bot · June 21, 2024, 1:12pm

| username: zhanggame1 | Original post link

The IMPORT INTO syntax is just for handling data files, right?

translator_bot · June 21, 2024, 1:12pm

| username: 小龙虾爱大龙虾 | Original post link

Yes, it is not supported yet.

translator_bot · June 21, 2024, 1:12pm

| username: h5n1 | Original post link

The syntax “import into xxx select xxx” is not mentioned in the documentation.

translator_bot · June 21, 2024, 1:12pm

| username: 小龙虾爱大龙虾 | Original post link

Yes, it is not supported yet.

translator_bot · June 21, 2024, 1:12pm

| username: dba远航 | Original post link

The suggestion is good, but controlling DML concurrency in a distributed database is more challenging than in a regular single database.

translator_bot · June 21, 2024, 1:12pm

| username: zhaokede | Original post link

Adding concurrency in SQL is very useful when system cutover and data processing are required during system go-live.

translator_bot · June 21, 2024, 1:12pm

| username: oceanzhang | Original post link

I think TiDB’s performance lies in reading rather than writing, there’s no need to learn to be like Oracle.

translator_bot · June 21, 2024, 1:12pm

| username: forever | Original post link

It’s not about learning from someone, it’s about solving practical scenarios. For example, we often need to extract some data from one table to a backup table and then process the data.