Index Building Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 建索引问题

| username: TiDBer_Terry261

[Test Environment for TiDB]
[TiDB Version] 6.6.0
[Reproduction Path] Creating Index
[Encountered Problem: Phenomenon and Impact]
After executing the SQL to create an index, I checked the status through ADMIN SHOW DDL JOBS and found that it hasn’t changed for a long time.


Is there a problem? How to resolve it?

[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]

| username: xfworld | Original post link

We can only wait, DDL is an asynchronous operation.

| username: TiDBer_Terry261 | Original post link

Currently, the cluster has no load, which is to speed up the index creation process. All related DDL parameters have also been increased.

| username: dba-kit | Original post link

The difference between the creation time and the start time is so large. What is your machine configuration, and how much data is being modified in the table?

| username: TiDBer_Terry261 | Original post link

How long do I have to wait for a rough estimate? I’ve already been waiting for more than ten hours.

| username: TiDBer_Terry261 | Original post link

I executed multiple index creation commands on multiple consoles, queuing them to execute one by one. The first index took about 36 hours. The second one is the one shown in the picture. It has been unresponsive for more than ten hours after starting. The total capacity of this table is approximately 9 billion records.

| username: TiDBer_Terry261 | Original post link

TIDB logs continuously showing WARN

| username: liuis | Original post link

DDL operations are inherently slower when there is a large amount of data.

| username: TiDBer_Terry261 | Original post link

When adding the first index, there is a ROW_COUNT field in DDL JOBS that keeps changing. However, when adding the second index, this field remains at 0.

| username: 裤衩儿飞上天 | Original post link

With a large amount of data, 9 billion records, and many indexed fields (4 fields), it is normal for the query time to be long.

| username: xingzhenxiang | Original post link

I remember that DDL operations are queued, right? Does the second one start running only after the first one is completed?

| username: TiDBer_Terry261 | Original post link

I canceled all index retrieval tasks using ADMIN CANCEL DDL JOB, then restarted the cluster and found that all TIDB nodes couldn’t start.


The TIDB nodes keep outputting logs.

Job id=223 is the canceled index retrieval task. How can I resolve this? Please help.

| username: tidb狂热爱好者 | Original post link

All TiDB hosts run service tidb-4000 stop; service tidb-4000 start and then the problem is resolved.

| username: xingzhenxiang | Original post link

It seems to be a testing environment, I recommend using the LTS version.

| username: 魔礼养羊 | Original post link

A terrible idea.

Why don’t you try building the index for one day’s data first, and then for the most recent seven days’ data?

That’s roughly the idea. With billions of rows of data, building any non-row data, let alone an index, would be slow.

| username: TiDBer_Terry261 | Original post link

In a table with only 110,000 records, the index retrieval is also running continuously, and it hasn’t finished even after two hours. The SCHEMA STATE is always WRITE REORGANIZATION, and the state is running. The background log keeps printing.

| username: knull | Original post link

I want to confirm, what is the tidb_ddl_distribute_reorg parameter like?

| username: TiDBer_Terry261 | Original post link

I think the problem is that the tidb-server process is not running.

| username: knull | Original post link

First of all, the distributed reorg in version 6.6 is currently unavailable. It is recommended that users do not enable tidb_enable_distribute_reorg, as enabling this along with fast reorg can cause bugs.

| username: TiDBer_Terry261 | Original post link

Hello, what is TiDB doing right now? This table only has a few hundred thousand entries, but it has been running for a long time without finishing. The backend is continuously outputting logs.