Index Addition Stuck in SCHEMA_STATE Delete Only State

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 加索引卡在 SCHEMA_STATE delete only 状态

| username: a398058068

[TiDB Usage Environment] Production Environment / Test / Poc
Production Environment
[TiDB Version]
5.7.25-TiDB-v6.5.0
[Reproduction Path] Operations performed that led to the issue
Adding an index normally, field type timestamp, index type normal btree
[Encountered Issue: Problem Phenomenon and Impact]
Checking ADMIN SHOW DDL JOBS;
Stuck at SCHEMA_STATE delete only state

Tried ADMIN CANCEL DDL jobs, also stuck, final job status is cancelling
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Corresponding TiDB logs

[2023/01/09 13:46:49.584 +08:00] [INFO] [syncer.go:333] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip 172.18.150.111, port 4000, id 0705c04b-250f-44d8-9bd1-83a77063782b"] ["ddl id"=117697] [ver=121240]
t_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="ALTER TABLE `jinka_data`.`dy_video` \nADD INDEX `idx_update_time`(`update_time`) USING BTREE"]
[2023/01/09 13:45:07.253 +08:00] [INFO] [range_task.go:243] ["range task finished"] [name=resolve-locks-runner] [startKey=] [endKey=] ["cost time"=14.811660345s] ["completed regions"=63090]
[2023/01/09 13:45:07.253 +08:00] [INFO] [gc_worker.go:1095] ["[gc worker] finish resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630558572740623] [regions=63090]
[2023/01/09 13:45:45.803 +08:00] [INFO] [domain.go:2298] ["refreshServerIDTTL succeed"] [serverID=2511748] ["lease id"=7065857a8c81f7fb]
[2023/01/09 13:45:52.439 +08:00] [INFO] [gc_worker.go:454] ["[gc worker] gc safepoint blocked by a running session"] [uuid=615ea52f0dc0001] [globalMinStartTS=438626847504990215] [globalMinStartAllowedTS=438626847504990214] [safePoint=438630495658180608]
[2023/01/09 13:45:52.442 +08:00] [INFO] [gc_worker.go:601] ["[gc worker] last safe point is later than current one.No need to gc.This might be caused by manually enlarging gc lifetime"] ["leaderTick on"=615ea52f0dc0001] ["last safe point"=2023/01/09 09:43:55.807 +08:00] ["current safe point"=2023/01/09 09:43:55.807 +08:00]
[2023/01/09 13:45:52.442 +08:00] [INFO] [gc_worker.go:1073] ["[gc worker] start resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630574301380623] [concurrency=3]
[2023/01/09 13:45:52.442 +08:00] [INFO] [range_task.go:137] ["range task started"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=3]
[2023/01/09 13:45:59.207 +08:00] [WARN] [expensivequery.go:118] [expensive_query] [cost_time=11233.248725059s] [conn_id=5523392264086313751] [user=root] [database=jinka_data] [txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="ALTER TABLE `jinka_data`.`dy_video` \nADD INDEX `idx_update_time`(`update_time`) USING BTREE"]
[2023/01/09 13:46:07.161 +08:00] [INFO] [range_task.go:243] ["range task finished"] [name=resolve-locks-runner] [startKey=] [endKey=] ["cost time"=14.718393804s] ["completed regions"=63086]
[2023/01/09 13:46:07.161 +08:00] [INFO] [gc_worker.go:1095] ["[gc worker] finish resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630574301380623] [regions=63086]
[2023/01/09 13:46:52.440 +08:00] [INFO] [gc_worker.go:454] ["[gc worker] gc safepoint blocked by a running session"] [uuid=615ea52f0dc0001] [globalMinStartTS=438626847504990215] [globalMinStartAllowedTS=438626847504990214] [safePoint=438630511386820608]
[2023/01/09 13:46:52.444 +08:00] [INFO] [gc_worker.go:601] ["[gc worker] last safe point is later than current one.No need to gc.This might be caused by manually enlarging gc lifetime"] ["leaderTick on"=615ea52f0dc0001] ["last safe point"=2023/01/09 09:43:55.807 +08:00] ["current safe point"=2023/01/09 09:43:55.807 +08:00]
[2023/01/09 13:46:52.444 +08:00] [INFO] [gc_worker.go:1073] ["[gc worker] start resolve locks"] [uuid=615ea52f0dc0001] [safePoint=0] [try-resolve-locks-ts=438630590030020623] [concurrency=3]
[2023/01/09 13:46:52.444 +08:00] [INFO] [range_task.go:137] ["range task started"] [name=resolve-locks-runner] [startKey=] [endKey=] [concurrency=3]
[2023/01/09 13:46:59.307 +08:00] [WARN] [expensivequery.go:118] [expensive_query] [cost_time=11293.348475744s] [conn_id=5523392264086313751] [user=root] [database=jinka_data] [txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="ALTER TABLE `x`.`x` \nADD INDEX `idx_update_time`(`update_time`) USING BTREE"]
| username: Lucien-卢西恩 | Original post link

Check the TiDB log for any error messages with the keyword “[ddl] wait latest schema version change(get the metadata lock if tidb_enable_lock is true)”.
Reference documentation: 元数据锁 | PingCAP 文档中心

| username: a398058068 | Original post link

There was nothing. I checked the logs at that time, and it was just repeatedly executing the index-adding statements. That was it.

| username: Lucien-卢西恩 | Original post link

I didn’t understand what you meant. Could it be that manually retrying the same DDL statement multiple times caused this? Has it been resolved now?

| username: a398058068 | Original post link

The instance was restored after a restart.
It was just a normal index addition, nothing else.

| username: Lucien-卢西恩 | Original post link

This issue needs to be investigated by checking the TiDB server node of the DDL owner and the working status of the DDL owner for any obvious errors. Based on the logs provided so far, the cause cannot be pinpointed. You can continue to monitor the situation, and if it gets stuck again, you can capture the goroutine to assist in troubleshooting. Thank you ~