Tidb remove indexes very slow

Eugen_Korentsov · December 7, 2023, 4:02pm

Application environment:

Production

TiDB version: 7.1.2

Reproduction method:

Drop 4 indexes from single big table (162762548 rows)

Problem:

Hi, we tried to remove 4 indexes from single table with 1 command, like:
ALTER TABLE test
DROP INDEX sourceId_2,
DROP INDEX ref,
DROP INDEX ref2,
DROP INDEX url,
algorithm=INSTANT

And it is still running more than 24hours…
Additional info:
This table is added to tiflash.

According to documentation, delete index should be fast operation, why this happens?

TiDB-AI-Assistant · December 7, 2023, 4:02pm

I’m sorry to hear that you’re experiencing slow index removal in TiDB. Let me try to help you with that.

Based on the information you provided, you are trying to remove four indexes from a single table with a large number of rows. The slow execution of the ALTER TABLE command could be caused by several factors. Here are a few possible reasons:

Table size: The size of the table with 162,762,548 rows could impact the execution time of the index removal. Dropping indexes on a large table may take a significant amount of time, especially if the table is actively being used.
Concurrency control: By default, TiDB uses optimistic concurrency control (OCC) to handle concurrent transactions. This means that when you drop indexes, the operation needs to acquire locks on the table and the affected rows. If there are other transactions accessing the table concurrently, it may cause contention and slow down the index removal process.
TiFlash: You mentioned that the table is added to TiFlash. TiFlash is an analytical storage engine for TiDB, and it has its own indexing mechanism. When you drop indexes on a table in TiDB, the corresponding indexes in TiFlash also need to be updated. This additional step may contribute to the longer execution time.

To troubleshoot the issue, I recommend the following steps:

Check the TiDB log: Review the TiDB log files to see if there are any error messages or warnings related to the index removal operation. The log files can provide valuable insights into the execution process and any potential issues.
Monitor system resources: Monitor the system resources, such as CPU, memory, and disk usage, during the index removal process. High resource utilization could indicate a bottleneck that is causing the slow execution.
Consider splitting the operation: Instead of removing all four indexes in a single command, you can try removing them one by one. This approach may distribute the workload and potentially improve the execution time.
Optimize the table structure: If possible, consider optimizing the table structure to reduce the size and complexity of the indexes. This could involve reevaluating the necessity of certain indexes or redesigning the table schema to better suit your application’s needs.

Please note that these are general suggestions, and the actual solution may vary depending on your specific environment and workload. If the issue persists, I recommend reaching out to the TiDB community or consulting with the TiDB support team for further assistance.

I hope this information helps! Let me know if you have any further questions.

winkyao · December 7, 2023, 4:17pm

This is not very reasonable. Could you show the result of ‘admin show ddl jobs’?

Eugen_Korentsov · December 7, 2023, 4:19pm

1785	test	source_estates	alter table multi-schema change	none	90	1080	0	2023-12-06 13:45:46	2023-12-06 13:45:46		running
1785	test	source_estates	drop index /* subjob */	none	90	1080	0				done
1785	test	source_estates	drop index /* subjob */	none	90	1080	0				done
1785	test	source_estates	drop index /* subjob */	write only	90	1080	0				running
1785	test	source_estates	drop index /* subjob */	write only	90	1080	0				running
1784	test	flat_locations	create view	public	90	1783	0	2023-12-05 16:30:38	2023-12-05 16:30:38	2023-12-05 16:30:38	synced
1782	test	estates	add column	public	90	1066	0	2023-11-29 18:10:27	2023-11-29 18:10:28	2023-11-29 19:13:28	synced

winkyao · December 8, 2023, 12:57am

does this two jobs have any errors? you can scroll to the right like

winkyao · December 8, 2023, 2:57am

I reached out to the eng of this component, they told me it may be related to the Metadata Lock: Metadata Lock | PingCAP Docs Which was introduced after v6.3.0.

try to use this statement to check if there are DML statements are blocking the DDL statement.

SELECT * FROM mysql.tidb_mdl_view\G

You have two options can be applied:

kill the DML statement
set system variable tidb_enable_metadata_lock to ‘OFF’

This variable is used to set whether to enable the Metadata lock feature. Note that when setting this variable, you need to make sure that there are no running DDL statements in the cluster. Otherwise, the data might be incorrect or inconsistent.

Eugen_Korentsov · December 8, 2023, 10:31am

Okay, our alter was finished today morning, almost 2 days. Thank you for the hint with SELECT * FROM mysql.tidb_mdl_view, we will use it next time to understand what is blocking ddl.