DDL Stuck and Unresponsive

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DDL卡住不动

| username: wakaka

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.2.2
[Encountered Problem] DDL execution gets stuck at a certain point for a long time
[Reproduction Path] Creating index DDL
[Problem Phenomenon and Impact]
The table has about 170 million rows of data. Adding an index gets stuck at around 140 million rows for half an hour without moving.


Checked the DDL worker logs and found no errors

[Attachment]

| username: forever | Original post link

First, check if there are too many data versions and if the GC hasn’t cleaned them up. There was a post discussing this issue a few days ago:
900 million table adding index hasn’t finished executing after more than a day, how to troubleshoot? - TiDB - TiDB Q&A Community (asktug.com)

| username: wakaka | Original post link

At that time, the GC was set to 10 minutes, and there weren’t a large number of data versions.

| username: wakaka | Original post link

Following the post’s instructions didn’t work either.

| username: forever | Original post link

Have you resolved it? :grin:

| username: HACK | Original post link

It feels like this problem is quite common. I’ve seen several people experiencing this issue.

| username: xfworld | Original post link

When adding an index to a table with existing data, there will be an index rebuilding step. If it’s too slow, you can adjust the backfill speed. However, this will have a significant performance impact on the cluster…

| username: wakaka | Original post link

Tried 3 times, adjusted the parameters, but it didn’t work. It always gets stuck at the screenshot part and doesn’t proceed.

| username: xfworld | Original post link

Cancelled, try again.

| username: wakaka | Original post link

Tried 3 times, waited for several hours each time, but it didn’t work.

| username: xfworld | Original post link

Create a new table with the same structure, build all the indexes, move the data over… then delete the original table, and rename the new one…

| username: alfred | Original post link

Is it possible to do session tracking or Linux process tracking to see where it is stuck?

| username: xfworld | Original post link

I feel that this issue is still affected by a bug. If possible, upgrade to 5.2.4.
The method I mentioned can help you get past it…
But to completely resolve it, you still need to upgrade…

| username: wakaka | Original post link

Will our upgrade introduce any new issues? It seems like all the problems I’m encountering require an upgrade to resolve. I checked the 5.2.4 fix list and didn’t see this bug being fixed.

| username: xfworld | Original post link

These are all old issues with data processing :rofl:

I recommend upgrading to 5.2.4, as it has fixed some known issues.

| username: wakaka | Original post link

I am concerned that upgrading from my previous version 5.0.6 to a version with a large gap might introduce additional bugs. I am not sure which version would be appropriate to upgrade to.

| username: xfworld | Original post link

If you’re concerned, you can set up a resource and run a POC to base your evaluation on the results.

| username: wakaka | Original post link

The time and cost are also an issue, especially for a large cluster over 50T. Adding an index alone is too big of an action. :innocent:

| username: xiaohetao | Original post link

:+1::+1:

| username: xfworld | Original post link

Just sorted out, you can refer to it