The execution of the insert into select statement is particularly slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: insert into select语句执行特别慢

| username: TiDBer_o0MXVuK4

[TiDB Usage Environment] Testing
[TiDB Version] 6.1.2
[Encountered Problem: Phenomenon and Impact]
Executing the insert into select statement is particularly slow, and no other operations are being performed. The statement is as follows:

insert into its_sjzt_jg.DW1_MX_JF_YSXX_YF (ID, XMBH, XMMC, KMMC, KMJCID, WBS, JH, DQSH, SCJH, TXSM, BZ, TBRQ, BBH, XMID, SJID, XMLX_DM, YJYWLX_DM, EJYWLX_DM, SJYWLX_DM, SJYWLXX_DM, QYFRT_DM, KHJLITCODE, GZSYB_DM, XMCBZX_DM, SFARYJS, XMBQ, XMZBQ, SQZCNR, JFFZRITCODE, XMFZRITCODE, SQZCKSRQ, SQZCJSRQ, QYCQYXMH, KJZHTXMH, SFXYYS, SL, HL, HTYDFWKSRQ, HTYDFWJSRQ, WGBL, QYZT_DM, SWZT_DM, JFZT_DM, CWZT_DM, XMQYRQ, XMLXRQ, XMJXRQ, XMQDRQ, XMGBRQ, XMZTRQ, XMZFRQ, GZJFB_DM, JFBMCBZX_DM, JFLX_DM, ZZXMJLITCODE, ZZXMZJITCODE, ZZXMJLSJH, JFJHKSRQ, JFJHJSRQ, XMZTXDZ, ZHGXSJ, ZHGXCZRITCODE, SZBG, QYKHID, ZZYHID, XMDL_DM, XMXL_DM, HTID, GZJJFAB_DM, FZXMZJITCODE, JFSSCS, XMJLITCODE, XYLXYJ_DM, XYLXEJ_DM, PGKSRQ, PGJSRQ, JFJKR, XMXX, ZZJZLJLITCODE, XYLXSJ_DM, GLCPMC, GLCPBB, FZXMJLITCODE, XMJZLJLITCODE, XMXLX_DM, XMGLJB, KHX, CPXYWX, GLXM, GLYY, SRQRFS_DM, SFKJZHTXDZXM, SFKJZHT, SFXYJSSQZC, QYLX_DM, NDYF, ND, YF, JFBG, IDF_FLAG, IDF_XH, IDF_ACTIVITY_NAME, IDF_BIZ_PRIMARY)
SELECT t.ID, d.XMBH, d.XMMC, t.KMMC, t.KMJCID, t.WBS, t.JH, t.DQSH, t.SCJH, t.TXSM, t.BZ, t.TBRQ, t.BBH, d.XMID, d.SJID, d.XMLX_DM, d.YJYWLX_DM, d.EJYWLX_DM, d.SJYWLX_DM, d.SJYWLXX_DM, d.QYFRT_DM, d.KHJLITCODE, d.GZSYB_DM, d.XMCBZX_DM, d.SFARYJS, d.XMBQ, d.XMZBQ, d.SQZCNR, d.JFFZRITCODE, d.XMFZRITCODE, d.SQZCKSRQ, d.SQZCJSRQ, d.QYCQYXMH, d.KJZHTXMH, d.SFXYYS, d.SL, d.HL, d.HTYDFWKSRQ, d.HTYDFWJSRQ, d.WGBL, d.QYZT_DM, d.SWZT_DM, d.JFZT_DM, d.CWZT_DM, d.XMQYRQ, d.XMLXRQ, d.XMJXRQ, d.XMQDRQ, d.XMGBRQ, d.XMZTRQ, d.XMZFRQ, d.GZJFB_DM, d.JFBMCBZX_DM, d.JFLX_DM, d.ZZXMJLITCODE, d.ZZXMZJITCODE, d.ZZXMJLSJH, d.JFJHKSRQ, d.JFJHJSRQ, d.XMZTXDZ, d.ZHGXSJ, d.ZHGXCZRITCODE, d.SZBG, d.QYKHID, d.ZZYHID, d.XMDL_DM, d.XMXL_DM, d.HTID, d.GZJJFAB_DM, d.FZXMZJITCODE, d.JFSSCS, d.XMJLITCODE, d.XYLXYJ_DM, d.XYLXEJ_DM, d.PGKSRQ, d.PGJSRQ, d.JFJKR, d.XMXX, d.ZZJZLJLITCODE, d.XYLXSJ_DM, d.GLCPMC, d.GLCPBB, d.FZXMJLITCODE, d.XMJZLJLITCODE, d.XMXLX_DM, d.XMGLJB, d.KHX, d.CPXYWX, d.GLXM, d.GLYY, d.SRQRFS_DM, d.SFKJZHTXDZXM, d.SFKJZHT, d.SFXYJSSQZC, d.QYLX_DM, d.NDYF, d.ND, d.YF, d.JFBG, 'I', 1, 'ETL_CONF_DW1_MX_JF_YSXX_YF', CONCAT(IFNULL(t.KMJCID, ''), '@##@', IFNULL(d.XMBH, ''), '@##@', d.NDYF, '@##@', IFNULL(t.BBH, ''))
FROM its_sjzt_jg.DW0_JF_YSXX t
INNER JOIN its_sjzt_jg.DW1_MX_JF_YSXX_YF_ZJ_1 d
ON t.ID = d.XMID
AND d.TJJZRQ_JF <= t.YXQ_Z
AND d.TJJZRQ_JF >= t.YXQ_Q
AND t.YXQ_Z >= '2022-02-09 00:00:00'

[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]

  1. The data volume of the two tables involved in the query is as follows:
select count(*) from its_sjzt_jg.DW0_JF_YSXX -- 12140331
select count(*) from its_sjzt_jg.DW1_MX_JF_YSXX_YF_ZJ_1 -- 1095186
  1. Execution Plan

  2. Monitoring

  3. Cluster Configuration
    10 TiKV nodes, no TiFlash nodes, each TiKV node has 16 cores and 64GB. During the task execution, the CPU usage is minimal, but one TiDB node’s memory usage increases rapidly. The TiDB node has 32 cores and 128GB of memory.

| username: caiyfc | Original post link

You can first check if the select part is taking a long time. If it is, you can see if it’s because the statistics are inaccurate, leading to an index join, which causes the query to be slow.
If the select part is not slow, then you need to batch insert. You can add other filtering conditions or perform batch inserts by the hour.

| username: 张雨齐0720 | Original post link

Explain the SELECT statement to see if the execution plan is suboptimal. First, optimize the SELECT, then check the execution plan of the EXPLAIN INSERT.
Of course, you can also execute the complete statement for analysis and then check the monitoring to see where the time is being consumed.

| username: 我是咖啡哥 | Original post link

Writing it this way can easily cause OOM. You can try using Tispark. We have a project that often encounters this problem as well, it’s quite a headache…

| username: tidb菜鸟一只 | Original post link

The statistics for the table its_sjzt_jg.DW0_JF_YSXX might be inaccurate. I noticed that it used the YXQ_Z index. You can try recollecting the statistics or use a hint to specify a full table scan.

| username: TiDBer_o0MXVuK4 | Original post link

I am also planning to switch to Tispark.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.