[TiDB Usage Environment] Production environment, testing
[TiDB Version] v5.1.0
[Encountered Problem] The following index creation command in TiDB v5.1.0 has been executing for 4-5 days and has not yet completed. Is there any way to check its progress?
Image 1: Index creation command
There was a similar case before, caused by a bug where GC could not delete multi-version data. For more details, see the link: https://asktug.com/t/topic/903353/22
If the slow index addition is not caused by a bug, you can allocate more resources to ADD INDEX using the following method, but it will affect online business.
According to the prompt, the following command should be used during the create index process, but unfortunately, the create index command has already been executed.
set tidb_ddl_reorg_priority=‘PRIORITY_HIGH’;
Test cases for adjusting tidb_ddl_reorg_worker_cnt and tidb_ddl_reorg_batch_size on the official website regarding add index under write load and query load
Relevant Parameters
tidb_ddl_reorg_worker_cnt
Scope: GLOBAL
Default value (before version 2.1.17): 16 Default value (version 2.1.17 and later): 4
This variable is used to set the concurrency of the re-organize phase of DDL operations.
tidb_ddl_reorg_batch_size
Scope: GLOBAL
Default value (before version 2.1.17): 1024 Default value (version 2.1.17 and later): 256
This variable is used to set the batch size for the re-organize phase of DDL operations. For example, the Add Index operation requires backfilling index data, which is done by multiple workers concurrently (tidb_ddl_reorg_worker_cnt workers). Each worker backfills data in batches. If there are many Update or Replace operations during the Add Index process, the larger the batch size, the higher the probability of transaction conflicts. In this case, it is recommended to reduce the batch size, with a minimum value of 32. In the absence of transaction conflicts, the batch size can be set to a larger value, with a maximum of 10240, which speeds up data backfilling but also increases the write pressure on TiKV.
tidb_ddl_reorg_priority
Scope: SESSION
Default value: PRIORITY_LOW
This variable is used to set the execution priority of the re-organize phase of the ADD INDEX operation. It can be set to PRIORITY_LOW/PRIORITY_NORMAL/PRIORITY_HIGH.
Generally, the default value is sufficient.
It is crucial to perform the add index operation during low business peak periods or overall business idle periods to avoid a large number of write-write conflicts. Adding an index to a large table can cause high load on TiKV.
Creating an index consumes a lot of resources when scanning the table to backfill the index, and it may conflict with frequently updated fields, affecting normal business operations. The process of creating an index on a large table often takes a long time, so it is essential to balance execution time and cluster performance as much as possible, such as choosing non-peak update periods.
Parameter Adjustment:
Currently, the main parameters used to dynamically adjust the index creation speed are tidb_ddl_reorg_worker_cnt and tidb_ddl_reorg_batch_size. Generally, the smaller their values, the less impact on the system, but the longer the execution time.
In general, start with the default values of 4 and 256, observe the cluster resource usage and response speed, then gradually increase the tidb_ddl_reorg_worker_cnt parameter to increase concurrency. If the system does not show significant jitter, gradually increase the tidb_ddl_reorg_batch_size parameter. However, if the columns involved in the index are frequently updated, it will cause a large number of conflicts and retries.
Additionally, you can adjust the tidb_ddl_reorg_priority parameter to PRIORITY_HIGH to keep the index creation task at a high priority to speed up the process. However, in general OLTP systems, it is recommended to keep the default value.
SET GLOBAL tidb_ddl_reorg_worker_cnt = 16;
SET GLOBAL tidb_ddl_reorg_batch_size = 10240;