After upgrading TiDB from 5.1.2 to 6.5.3, executing DDL statements gets stuck

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB从5.1.2升级到6.5.3之后,执行DDL会被卡住

| username: dba-kit

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.3
[Encountered Issue: Problem Phenomenon and Impact]
After upgrading the cluster, executing DDL gets stuck in the queueing state. After using admin cancel ddl jobs, it remains in the cancelling state.

The logs during this period are:

[2023/07/02 10:02:58.510 +08:00] [INFO] [session.go:3730] ["CRUCIAL OPERATION"] [conn=5514371870693236863] [schemaVersion=1038] [cur_db=] [sql="RENAME TABLE `db_1`.`share_compare` TO `db_1`.`share_compare_bak_20230702`"] [user=core-dm-syncer@%]
[2023/07/02 10:02:58.511 +08:00] [INFO] [txn.go:55] ["Try to create a new txn inside a transaction auto commit"] [conn=5514371870693236863] [schemaVersion=1038] [txnStartTS=442568115084591117] [txnScope=global]
[2023/07/02 10:02:58.511 +08:00] [INFO] [tidb.go:270] ["rollbackTxn called due to ddl/autocommit failure"]
[2023/07/02 10:02:58.511 +08:00] [WARN] [session.go:2242] ["run statement failed"] [conn=5514371870693236863] [schemaVersion=1038] [error="[schema:1050]Table 'db_1.share_compare_bak_20230702' already exists"] [session="{\n  \"currDBName\": \"\",\n  \"id\": 5514371870693236863,\n  \"status\": 2,\n  \"strictMode\": false,\n  \"user\": {\n    \"Username\": \"core-dm-syncer\",\n    \"Hostname\": \"172.18.xxx.xxx\",\n    \"CurrentUser\": false,\n    \"AuthUsername\": \"core-dm-syncer\",\n    \"AuthHostname\": \"%\"\n  }\n}"]
[2023/07/02 10:02:58.511 +08:00] [INFO] [conn.go:1181] ["command dispatched failed"] [conn=5514371870693236863] [connInfo="id:5514371870693236863, addr:172.18.244.12:37298 status:10, collation:utf8mb4_general_ci, user:core-dm-syncer"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="RENAME TABLE `db_1`.`share_compare` TO `db_1`.`share_compare_bak_20230702`"] [txn_mode=OPTIMISTIC] [timestamp=442568115084591124] [err="[schema:1050]Table 'db_1.share_compare_bak_20230702' already exists\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/normalize.go:164\ngithub.com/pingcap/tidb/ddl.ExtractTblInfos\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:6059\ngithub.com/pingcap/tidb/ddl.(*ddl).renameTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:5962\ngithub.com/pingcap/tidb/ddl.(*ddl).RenameTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:5944\ngithub.com/pingcap/tidb/executor.(*DDLExec).executeRenameTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/ddl.go:243\ngithub.com/pingcap/tidb/executor.(*DDLExec).Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/ddl.go:184\ngithub.com/pingcap/tidb/executor.Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/executor.go:328\ngithub.com/pingcap/tidb/executor.(*ExecStmt).next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1154\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:927\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:776\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:571\ngithub.com/pingcap/tidb/session.runStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2373\ngithub.com/pingcap/tidb/session.(*session).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2237\ngithub.com/pingcap/tidb/server.(*TiDBContext).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/driver_tidb.go:252\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2122\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1972\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1403\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1152\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:648\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"]
[2023/07/02 10:02:58.517 +08:00] [WARN] [sysvar_cache.go:84] ["could not find key in global cache"] [name=timestamp]
[2023/07/02 10:02:58.517 +08:00] [INFO] [session.go:1489] ["sysvar not in cache yet. sysvar cache may be stale"] [name=timestamp]
[2023/07/02 10:03:16.539 +08:00] [INFO] [session.go:3730] ["CRUCIAL OPERATION"] [conn=5514371870693236863] [schemaVersion=1038] [cur_db=] [sql="CREATE TABLE IF NOT EXISTS `db_1`.`share_compare` LIKE `db_1`.`share_compare_bak_20230702`"] [user=core-dm-syncer@%]
[2023/07/02 10:03:16.539 +08:00] [INFO] [txn.go:55] ["Try to create a new txn inside a transaction auto commit"] [conn=5514371870693236863] [schemaVersion=1038] [txnStartTS=442568119816552460] [txnScope=global]
[2023/07/02 10:03:16.557 +08:00] [INFO] [ddl_worker.go:314] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:1259, Type:create table, State:queueing, SchemaState:none, SchemaID:54, TableID:1258, RowCount:0, ArgLen:2, start time: 2023-07-02 10:03:16.517 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0; "] [table=true]
[2023/07/02 10:03:16.559 +08:00] [INFO] [ddl.go:1011] ["[ddl] start DDL job"] [job="ID:1259, Type:create table, State:queueing, SchemaState:none, SchemaID:54, TableID:1258, RowCount:0, ArgLen:2, start time: 2023-07-02 10:03:16.517 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="CREATE TABLE IF NOT EXISTS `db_1`.`share_compare` LIKE `db_1`.`share_compare_bak_20230702`"]
[2023/07/02 10:03:46.261 +08:00] [INFO] [domain.go:2318] ["refreshServerIDTTL succeed"] [serverID=2507646] ["lease id"=57678907b3df78e1]
[2023/07/02 10:04:02.490 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=40.292596ms]
[2023/07/02 10:04:02.490 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=40.391746ms]
[2023/07/02 10:04:16.629 +08:00] [WARN] [expensivequery.go:118] [expensive_query] [cost_time=60.090551825s] [conn_id=5514371870693236863] [user=core-dm-syncer] [txn_start_ts=442568119816552460] [mem_max="0 Bytes (0 Bytes)"] [sql="CREATE TABLE IF NOT EXISTS `db_1`.`share_compare` LIKE `db_1`.`share_compare_bak_20230702`"]
| username: dba-kit | Original post link

Here are some additional screenshots:
Before DDL cancellation:


After DDL cancellation:

| username: tidb狂热爱好者 | Original post link

It is necessary to restart TiDB because all frontends are unable to elect DDL.

| username: tidb狂热爱好者 | Original post link

Another reason is that the execution time exceeds max_execution_time. Check if you have set the maximum execution time.

| username: dba-kit | Original post link

Indeed, the maximum execution time has been changed to 10 seconds. Is it impossible to cancel after exceeding max_execution_time?

| username: tidb狂热爱好者 | Original post link

Yes, you should cancel this setting.

| username: tidb狂热爱好者 | Original post link

Cancel this setting and change it back to 10 seconds after completing the DDL task.

| username: 啦啦啦啦啦 | Original post link

Did the old version not have max_execution_time set?

| username: buptzhoutian | Original post link

max_execution_time is currently only used to control the maximum execution time of read-only statements.

| username: zhanggame1 | Original post link

The documentation states that max_execution_time is only effective for read-only statements and does not mention anything related to DDL.

| username: dba-kit | Original post link

The issue has been identified. It was caused by one of the tidb-servers not being reloaded during the TiDB version upgrade. After reloading it just now, the DDL is functioning normally. :upside_down_face:

| username: zhanggame1 | Original post link

Did you forget to reload one machine or did one of the reloads fail?

| username: dba-kit | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.