TiDB Large Table Index Creation Time Exceeds Expectations and Is Still Incomplete

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB大表创建索引时间超长尚未完成

| username: jingyesi3401

[TiDB Usage Environment] Production environment, testing

[TiDB Version] v5.1.0

[Encountered Problem] The following index creation command in TiDB v5.1.0 has been executing for 4-5 days and has not yet completed. Is there any way to check its progress?
Image 1: Index creation command

Image 2: Index creation time

Image 3: Table information

| username: dockerfile | Original post link

admin show ddl jobs to check the progress of the DDL.

| username: OnTheRoad | Original post link

There was a similar case before, caused by a bug where GC could not delete multi-version data. For more details, see the link: https://asktug.com/t/topic/903353/22

| username: jingyesi3401 | Original post link

You can see the process display, but the progress is slow.

  1. October 26, 2022, 14:52

  2. October 26, 2022, 14:56

| username: OnTheRoad | Original post link

If the slow index addition is not caused by a bug, you can allocate more resources to ADD INDEX using the following method, but it will affect online business.

| username: jingyesi3401 | Original post link

The tidb_ddl_reorg_worker_cnt has been adjusted from 4 to 8. If it doesn’t work, we will adjust it to 16.

| username: OnTheRoad | Original post link

The image is not visible. Please provide the text content for translation.

| username: jingyesi3401 | Original post link

The image you provided is not accessible. Please provide the text content you need translated.

| username: jingyesi3401 | Original post link

However, adjusting tidb_ddl_reorg_priority reports the following error:


According to the prompt, the following command should be used during the create index process, but unfortunately, the create index command has already been executed.
set tidb_ddl_reorg_priority=‘PRIORITY_HIGH’;

| username: 近墨者zyl | Original post link

Test cases for adjusting tidb_ddl_reorg_worker_cnt and tidb_ddl_reorg_batch_size on the official website regarding add index under write load and query load

  1. Relevant Parameters

tidb_ddl_reorg_worker_cnt

Scope: GLOBAL

Default value (before version 2.1.17): 16 Default value (version 2.1.17 and later): 4

This variable is used to set the concurrency of the re-organize phase of DDL operations.

tidb_ddl_reorg_batch_size

Scope: GLOBAL

Default value (before version 2.1.17): 1024 Default value (version 2.1.17 and later): 256

This variable is used to set the batch size for the re-organize phase of DDL operations. For example, the Add Index operation requires backfilling index data, which is done by multiple workers concurrently (tidb_ddl_reorg_worker_cnt workers). Each worker backfills data in batches. If there are many Update or Replace operations during the Add Index process, the larger the batch size, the higher the probability of transaction conflicts. In this case, it is recommended to reduce the batch size, with a minimum value of 32. In the absence of transaction conflicts, the batch size can be set to a larger value, with a maximum of 10240, which speeds up data backfilling but also increases the write pressure on TiKV.

tidb_ddl_reorg_priority

Scope: SESSION

Default value: PRIORITY_LOW

This variable is used to set the execution priority of the re-organize phase of the ADD INDEX operation. It can be set to PRIORITY_LOW/PRIORITY_NORMAL/PRIORITY_HIGH.

Generally, the default value is sufficient.

  1. It is crucial to perform the add index operation during low business peak periods or overall business idle periods to avoid a large number of write-write conflicts. Adding an index to a large table can cause high load on TiKV.

Creating an index consumes a lot of resources when scanning the table to backfill the index, and it may conflict with frequently updated fields, affecting normal business operations. The process of creating an index on a large table often takes a long time, so it is essential to balance execution time and cluster performance as much as possible, such as choosing non-peak update periods.

Parameter Adjustment:

Currently, the main parameters used to dynamically adjust the index creation speed are tidb_ddl_reorg_worker_cnt and tidb_ddl_reorg_batch_size. Generally, the smaller their values, the less impact on the system, but the longer the execution time.

In general, start with the default values of 4 and 256, observe the cluster resource usage and response speed, then gradually increase the tidb_ddl_reorg_worker_cnt parameter to increase concurrency. If the system does not show significant jitter, gradually increase the tidb_ddl_reorg_batch_size parameter. However, if the columns involved in the index are frequently updated, it will cause a large number of conflicts and retries.

Additionally, you can adjust the tidb_ddl_reorg_priority parameter to PRIORITY_HIGH to keep the index creation task at a high priority to speed up the process. However, in general OLTP systems, it is recommended to keep the default value.

SET GLOBAL tidb_ddl_reorg_worker_cnt = 16;
SET GLOBAL tidb_ddl_reorg_batch_size = 10240;
| username: 近墨者zyl | Original post link

It must be done during low business idle periods.

| username: forever | Original post link

First, check the GC. For such a large table, the GC impact is quite significant.

| username: alfred | Original post link

Check the IO usage. If the IO utilization is high, creating an index and backfilling data with such a large amount of data can indeed be quite slow.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.