DDL Blocking Issue

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ddl阻塞问题

| username: h5n1

【TiDB Version】7.5.0
This morning, I executed two DDL operations:

  1. Added a column to the table ob_host
  2. Set global autocommit=off;
  3. Added an index to the table ob_host (only 6 rows of data)

When adding the index, it got blocked and remained in the running state.

Checking the tidb.log of the DDL leader, I found continuous reports of:
[2024/01/04 10:57:08.458 +08:00] [INFO] [syncer.go:362] ["syncer check all versions, someone is not synced"] [category=ddl] [info="instance ip 10.xxxx, port 4001, id 58d8f3e3-1508-4543-bf52-f7173d80f4ce"] ["ddl job id"=6928] [ver=820]
Detailed logs are attached.

Checking tidb_mdl_view, I found job blocking information:
` | 6928 | ob_monitor | ob_host | create index ob_host_idx2 on ob_host(tenant_name ,db_role) | 933243620 | 01-04 11:06:07.640(446781867331420162) |
| 6928 | ob_monitor | ob_host | create index ob_host_idx2 on ob_host(tenant_name ,db_role) | 933243624 | 01-04 11:08:29.390(446781904490332164) |
| 1399 | ob_monitor | ob_host | alter table ob_host add username varchar(16) ,add password varchar(128) | 933243624 | 01-04 11:08:29.390(446781904490332164) |
| 1399 | ob_monitor | ob_host | alter table ob_host add username varchar(16) ,add password varchar(128) | 933243620 | 01-04 11:06:07.640(446781867331420162) |
| 1399 | ob_monitor | ob_host | alter table ob_host add username varchar(16) ,add password varchar(128) | 933243624 | 01-04 11:08:29.390(446781904490332164) |

The above jobs, except for the currently running blocked one, show information from several days ago, but the DML statement’s trx_start is from today.


Killing the above sessions, upon rechecking, only the session with DDL job=6928 was killed, and the index addition operation completed. The DDL job=1399 was tried twice but was not killed.
image

Issues

  1. When blocked, the tidb.log of the DDL owner shows “schema not synced,” which does not match the actual performance.
  2. The information recorded in tidb_mdl_view is inaccurate. The record time for DDL job=1399 is today, but the job is already synced.
  3. Killing TiDB is ineffective.

In the afternoon, the information in mdl_view disappeared.
image

| username: FutureDB | Original post link

How did you kill it? Normally, killing TiDB should be possible.

| username: wangccsy | Original post link

Adding an index will read the entire table data index.

| username: jiyf | Original post link

The definition of the mdl view table has a bug. It only filters the job states ‘synced’ and ‘cancelled’, but there are other states representing completion, such as ‘done’, which are not filtered out, resulting in incorrect view results.

| username: dba远航 | Original post link

An exception occurred when adding a column with DDL, causing subsequent index addition to fail.