Frequent DDL Stalls and Blockages

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DDL 频繁卡住阻塞

| username: TiDBer_vFs1A6CZ

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.1
[Reproduction Path] Regularly perform truncate operations on the table
[Encountered Problem: Phenomenon and Impact] The table truncate statement gets stuck, causing subsequent tasks to pile up
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachments: Screenshots/Logs/Monitoring]

Attempted Measures:
Failed to cancel DDL statement
ADMIN CANCEL DDL JOBS job_id

Triggered TiDB Leader re-election, but still couldn’t release the stuck DDL
curl -X POST http://xxxx:10080/ddl/owner/resign

Currently successful measure is to restart TiDB Server one by one to release the stuck DDL.
However, the stuck rate is very frequent; it got stuck again today after being released yesterday. Restarting TiDB Server each time has a significant impact on work tasks.

I would like to inquire if there is any way to release the stuck task without restarting, or to avoid DDL getting stuck.

| username: 啦啦啦啦啦 | Original post link

SELECT * FROM mysql.tidb_mdl_view
Check if there is a metadata lock.

| username: TiDBer_vFs1A6CZ | Original post link

Querying this table is very slow, and it hasn’t returned any data.

| username: vincentLi | Original post link

How many regions do you have? I’ve also encountered slow metadata queries.

| username: TiDBer_vFs1A6CZ | Original post link

Each TiKV node shows 60K regions.

| username: Kongdom | Original post link

:joy: Did the issue from last year not get completely resolved, and now a similar problem has appeared again this year? Does restarting no longer work?

| username: TiDBer_vFs1A6CZ | Original post link

Restarting works, but the impact is significant. How can we avoid DDL blocking, or release DDL blocking tasks without restarting?

| username: Kongdom | Original post link

:thinking: Are DDL statements recorded in the slow query log? If so, you can check which step is slow.

| username: TiDBer_vFs1A6CZ | Original post link

Yes, these two DDL executions of truncate were blocked for more than 7 hours. It was blocked once yesterday and again today.

| username: Kongdom | Original post link

Is there an execution plan when you open it?

| username: TiDBer_vFs1A6CZ | Original post link

Just one truncate statement, nothing else.

| username: Kongdom | Original post link

Look, this is where the metadata lock was just enabled. Try disabling the metadata lock to see if it has any effect?

| username: Kongdom | Original post link

It could also be a transaction issue.

| username: TiDBer_vFs1A6CZ | Original post link

Checking the metadata lock is in the open state, what other impacts will there be if it is closed?

| username: Kongdom | Original post link

This was disabled by default before version 6.5, and it was only enabled by default starting from 6.5. I understand that disabling it has no impact.

| username: TiDBer_vFs1A6CZ | Original post link

Thank you. I have already released the blocking task, reassigned the TiDBServer node, and checked that the schema information is correct. Let’s see if it will still be blocked tomorrow. I will try to disable the metadata lock if it gets blocked again.

| username: Kongdom | Original post link

:handshake: :handshake: :handshake: