Slow SQL Consultation

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 慢SQL咨询

| username: Hacker_ythhI9Qd

To improve efficiency, please provide the following information. Clear problem descriptions can be resolved faster:
【TiDB Usage Environment】
TiKV, 32C 64G

【Phenomenon】 Business and database phenomena
3 slow SQLs

【TiDB Version】
5.7.25-TiDB-v4.0.10

  1. Insert takes more than 1 second, with about 20 million rows in a single table;
    insert into t_invoice_kj_record (msg_id, tax_disk, real_tax_disk, kpr_id, fplxdm, kplx, tspz, xhdwsbh, ghdwmc, ghdwsbh, ghdwdz, ghdwdh, ghdwyh, ghdwzh, hjje, hjse, jshj, bz, qdbz, kpjh, sign ) values (…)

Execution Plan
id task estRows operator info actRows execution info memory disk
Insert_1 root 0 N/A 0 time:419.505µs, loops:1, prepare:282.66µs, insert:136.845µs, commit_txn: {prewrite:1.718988742s, wait_prewrite_binlog:229ns, get_commit_ts:275.08µs, commit:3.312319ms, region_num:2, write_keys:2, write_byte:538} 3.8310546875 KB N/A

  1. Insert … on duplicate key update is slow, with about 500+ rows in a single table;
    insert into t_plugin_online_record (disk_type, tax_disk, disk_num, version, xhdwsbh, xhdwmc, plugin_version) values (xx) on duplicate key update disk_type=1,disk_num=‘’,version=‘’,xhdwsbh=‘’,xhdwmc=‘’,plugin_version=‘’,last_time=now();

Execution Plan:
id task estRows operator info actRows execution info memory disk
Insert_1 root 0 N/A 0 time:36.579234809s, loops:2, prepare:3.982522ms, check_insert:{total_time:36.575252287s, mem_insert_time:408.598µs, prefetch:36.574843689s, rpc:{BatchGet:{num_rpc:6, total_time:22.632265ms}}}, commit_txn: {prewrite:881.108µs, wait_prewrite_binlog:6.997531ms, region_num:1, write_keys:1, write_byte:149, txn_retry:1} 1.0927734375 KB N/A

Table Structure:
CREATE TABLE t_plugin_online_record (
id bigint(20) NOT NULL AUTO_INCREMENT,
disk_type tinyint(4) NOT NULL COMMENT ‘Tax Disk Type’,
tax_disk varchar(12) COMMENT ‘Tax Disk Number’,
disk_num varchar(3) COMMENT ‘Invoice Machine Number’,
version varchar(15) COMMENT ‘Version Number’,
xhdwsbh varchar(20) COMMENT ‘Tax Number’,
xhdwmc varchar(100) COMMENT ‘Taxpayer Name’,
ip varchar(40) DEFAULT ‘’ COMMENT ‘IP’,
plugin_version varchar(10) DEFAULT ‘’ COMMENT ‘Version Number 1’,
create_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘Creation Time’,
last_time datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘Modification Time’,
PRIMARY KEY (id),
UNIQUE KEY uniq_tax_disk (tax_disk)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 AUTO_INCREMENT=2327025 COMMENT=‘Status Record Table’;

  1. Update by id is slow, id is the primary key, with about 20 million rows in a single table;
    update t_plugin_transfer set status=1,result=‘’ where id=19704790
    Execution Plan:
    id task estRows operator info actRows execution info memory disk
    Update_2 root 0 N/A 0 time:775.147µs, loops:1, , commit_txn: {prewrite:2.177618ms, wait_prewrite_binlog:1.236378809s, get_commit_ts:346.903µs, commit:1.613045ms, region_num:1, write_keys:1, write_byte:3650} 4.44921875 KB N/A └─Point_Get_1 root 1 table:t_plugin_transfer, handle:19704790 1 time:618.636µs, loops:2, Get:{num_rpc:1, total_time:559.972µs} N/A N/A

Regarding on duplicate key update, it is not recommended by the official documentation, but it is still needed in some batch data processing scenarios. If it is split into checking whether the data exists before inserting or updating, the efficiency will be greatly reduced.
For inserts taking more than 1 second, in the case of a large number of concurrent inserts, the id in version 5.x can be set to auto-random. Are there any other suggestions?

| username: xfworld | Original post link

  1. t_invoice_kj_record
    The insert operation is not slow, taking time: 419.505µs;
    Regarding this, I have a question:
    a. Is there a hotspot issue?
    b. Is there enough optimization for batch insert?

  2. The t_plugin_online_record table has a very small amount of data, only 500…
    But the scenario expects to support insert and update. I checked the reason for the slow execution plan:
    time: 36.579234809s total time
    check_insert: {total_time: 36.575252287s, mem_insert_time: 408.598µs, prefetch: 36.574843689s, rpc: {BatchGet: {num_rpc: 6, total_time: 22.632265ms}}}
    The process of checking the insert took more than half of the time… Among them, the rpc callback executed 6 times, taking 22.632265ms.
    Questions:

a. Is there a hotspot issue?
The 4.X version also has a clustered index, but it is implicit. If a hotspot issue is determined, it is recommended to switch to auto_random.

b. Is there enough optimization space in the business scenario? For the unique identifier, it is definitely known, so at the processing level, it can simply be switched to batch insert and batch update, and this transaction will not be too large, which can solve the slow issue.

  1. What type of structure is t_plugin_transfer?
    According to the plan you provided, it is not slow.
    commit_txn: {prewrite: 2.177618ms, wait_prewrite_binlog: 1.236378809s, get_commit_ts: 346.903µs, commit: 1.613045ms, region_num: 1, write_keys: 1, write_byte: 3650}
    All are in ms units…

A bigger question is whether the overall configuration of the current environment follows the official requirements?

| username: Hacker_ythhI9Qd | Original post link

  1. t_invoice_kj_record
    a. This table involves simple insertions and updates. The primary key is auto-incremented, which might cause hotspots. Besides setting auto_random, are there any other methods?
    b. Currently, it does not involve batch inserts. If it does, batch inserts will generally be considered.

  2. t_plugin_online_record
    This table has an ID primary key and a unique key. It seems that removing the ID primary key could also reduce conflicts. Additionally, there is a business scenario where 6 million business data records are integrated each time. Based on the unique key, it is determined whether to insert or update. Not using insert … on duplicate key update might be relatively slow. Are there any good suggestions?

  3. t_plugin_transfer table structure
    CREATE TABLE t_plugin_transfer (
    id bigint(20) NOT NULL AUTO_INCREMENT COMMENT ‘id’,
    tenant_id varchar(64) NOT NULL DEFAULT ‘’ COMMENT ‘tenant id’,
    request_type tinyint(4) NOT NULL DEFAULT ‘1’ COMMENT ‘request type’,
    msg_id varchar(48) NOT NULL COMMENT ‘message id (unique for each request)’,
    disk_type tinyint(4) NOT NULL COMMENT ‘tax disk type’,
    tax_disk varchar(12) NOT NULL COMMENT ‘tax disk number’,
    record longtext NOT NULL COMMENT ‘forwarding message’,
    sign char(32) NOT NULL DEFAULT ‘’ COMMENT ‘MD5 signature’,
    status tinyint(4) DEFAULT ‘0’ COMMENT ‘processing status’,
    result varchar(4000) DEFAULT ‘’ COMMENT ‘error reason or response result’,
    create_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘creation time’,
    update_time datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘modification time’,
    PRIMARY KEY (id),
    KEY idx_msg_id (tenant_id,msg_id)
    ) ENGINE=InnoDB AUTO_INCREMENT=633008 COMMENT=‘request forwarding record table’;

  4. TiDB is set up by the DBA and should support OLTP and OLAP according to official requirements. Currently, the configuration is 32C64G with 2 nodes, TiFlash is not used, and TiCDC status is unknown.

| username: xfworld | Original post link

  • According to your description, t_plugin_online_record should have a system-unique primary key and a business-unique key. This should not be a source of conflict, and identifying the business is necessary. Is there any middleware other than the database that can assist? Optimizing the business model and reducing queries would be more efficient.

  • For t_invoice_kj_record, if you want high performance, either use the auto_random provided by the official source or use the Snowflake algorithm to optimize, adhering to the principle of high-bit swapping, which can also support dynamic sharding. AUTO_INCREMENT is suitable for scenarios with low performance requirements, generally for compatibility during migration.

  • It is recommended to consult a DBA to see if there is any way to support you… :star_struck:

| username: Hacker_ythhI9Qd | Original post link

There are Redis and Kafka options available. This is mainly for business records, and the data volume is not large. It is only queried when needed, usually just inserted or updated, which is why we chose insert ... on duplicate key update. Later, we can modify to auto_random or check if it exists first before deciding whether to insert or update.

For integrating over 6 million business data entries each time, determining whether to insert or update based on the unique key, this kind of batch insert-update scenario feels like it will be quite slow.

| username: xfworld | Original post link

What kind of scenario involves integrating over 6 million business data entries each time?

| username: Hacker_ythhI9Qd | Original post link

6 months of accounts receivable data integration

| username: xfworld | Original post link

That is also a result of merging N business data and dimensions, right?

It’s not too bad :custard:

If this data will be retained for a long time, I suggest enabling TiFlash to speed up queries;
However, your version is too low. I recommend using version 5.4.X as it has fewer issues.

| username: Hacker_ythhI9Qd | Original post link

There might be a problem with the description. Each time, we take 6 months of data (placed in Table A) and compare it with the already integrated business Table B. Based on the unique key of the two tables, we insert or update the data from Table A into Table B. The difference between Table A and Table B is generally in the tens of thousands, and during cross-month periods, it reaches the millions. The number of new or updated records in Table B ranges from tens of thousands to millions.

| username: xfworld | Original post link

There are two concepts for merging data:

  1. Business data updates
  2. Statistical data updates

If it’s the first type, there will be some issues regardless of which database you use. The processing will inevitably require various optimizations, and you will encounter the following difficulties:

  • Processing is not fast enough, and timeliness is not good
  • Very large transactions, prone to OOM

If it’s the second type, similar to metric results, it is generally append, and upsert is not common.

| username: Hacker_ythhI9Qd | Original post link

For upsert operations, MySQL takes about 8 minutes for around 10,000 records and about 50 minutes for millions of records. In such scenarios, TiDB might take even longer.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. No new replies are allowed.