SSD Hard Drive, TiDB v6.5.0, Adding a Composite Index to a Large Table with 200 Million Rows and Over 70 Columns Takes 2 Hours and 37 Minutes. Are There Any Ways to Improve Indexing Efficiency and Reduce Time Consumption?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ssd的硬盘,tidb v6.5.0,对2亿的大表,表有70多个字段,添加3字段的复合索引,消耗2小时37分钟,有没有什么办法,提高添加索引的效率,减少消耗时间?

| username: vcdog

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact] The TiDB cluster uses SSD disks, TiDB v6.5.0. For a large table with 200 million rows and over 70 columns, adding a composite index with 3 columns took from 2023-09-05 18:05:58 to 2023-09-05 20:42:00, totaling 2 hours and 37 minutes.

Is there any way to improve the efficiency of adding indexes and reduce the time consumption?

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: vcdog | Original post link

The link you provided appears to be an image from the AskTUG forum. Unfortunately, I cannot view or translate the content of the image directly. Please provide the text you need translated, and I will be happy to assist you.

| username: wenyi | Original post link

Setting @@global.tidb_ddl_reorg_batch_size to 1024 and @@global.tidb_ddl_reorg_worker_cnt to 8 can improve the index creation speed. However, it is not recommended to set these values too high.

| username: vcdog | Original post link

Thank you, boss. I’ll refer to it and make some adjustments.

| username: 大飞哥online | Original post link

Add index and data backfill parameters. Take it easy when making changes.

| username: h5n1 | Original post link

It looks like you added the index using the traditional method based on your DDL job results. In version 6.5, there is a fast index addition feature with tidb_ddl_enable_fast_reorg. First, check if the default value is On. Were there any other DDL operations when adding the index?

| username: chenhanneu | Original post link

Create the corresponding directory.

| username: cassblanca | Original post link

Will the columns for which indexes are created be frequently updated and modified? This could lead to conflicts and affect normal business operations. If the columns involved in the index are updated very frequently, it will cause a lot of conflicts and retries, making the index creation process take a long time.

You can try adjusting the following parameters:
tidb_ddl_reorg_worker_cnt and tidb_ddl_reorg_batch_size to dynamically adjust the index creation speed. Generally, the smaller their values, the less impact on the system, but the longer the execution time. Additionally, you can set the parameter tidb_ddl_reorg_priority to PRIORITY_HIGH to keep the index creation task at a high priority to speed up the process.

You can also refer to the following test case: Online Workloads and ADD INDEX Operations | PingCAP Documentation Center

| username: TiDBer_vfJBUcxl | Original post link

Best Practices for Adding Indexes

TiDB supports online index addition operations, which can be performed using ADD INDEX or CREATE INDEX. Adding an index does not block data reads and writes in the table. You can adjust the parallelism of the DDL operation re-organize phase and the batch size of index backfill by modifying the following system variables:

To minimize the impact on online business, the default speed for adding indexes is relatively conservative. When the target column for adding an index only involves query load or is not directly related to the online load, you can appropriately increase the above variables to speed up the index addition:
SET @@global.tidb_ddl_reorg_worker_cnt = 16;
SET @@global.tidb_ddl_reorg_batch_size = 4096;

When the target column for the index addition operation is frequently updated (including UPDATE, INSERT, and DELETE), increasing the above configurations can cause more frequent write conflicts, resulting in a larger online load; at the same time, the index addition operation may take a long time to complete due to constant retries. In this case, it is recommended to reduce the above configurations to avoid write conflicts with online business:
SET @@global.tidb_ddl_reorg_worker_cnt = 4;
SET @@global.tidb_ddl_reorg_batch_size = 128;

| username: zhanggame1 | Original post link

If this table doesn’t have many modifications and the CPU resources are sufficient, increasing the parameters can make it several times faster.

| username: 大飞哥online | Original post link

It’s better to do it during off-peak business hours, and then change it back once it’s done. :face_with_peeking_eye: