Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 转成分区表耗时很久
Previously, I checked the DDL progress when adding an index, and I remember that row_count represents the number of processed rows. I’m quite curious why the row_count reaches 300 million when partitioning a table with 4 million rows.
ALTER TABLE is very common 
348,715,609 rows of data, it should be a very large table, so it’s normal to be slow.
- There might be an issue with the statistics algorithm.
- There might be excessive MVCC.
- There might be changes affecting during the DDL execution process.
You can provide feedback in the feedback section.
Bug Report
Clearly and accurately describe the issue you have found. Providing any steps to reproduce the issue can help the development team address the problem promptly.
[TiDB Version]
[Impact of the Bug]
[Possible Steps to Reproduce the Issue]
[Observed Unexpected Behavior]
[Expected Behavior]
[Related Components and Specific Versions]
[Other Background Information or Screenshots]
Such as cluster topology, system and kernel version, application app information, etc. If the issue is related to SQL, please provide the SQL statements and related table schema information. If there are critical errors in the node logs, please provide the relevant node log content or files. If some business-sensitive information is inconvenient to provide, please leave your contact information, and we will communicate with you privately.
Please provide feedback in this manner~
How long is the GC setting? Will it be related to the old version of the data?
Is it because the statistics are inaccurate or there are too many historical versions?
Is the number of keys in the KV database the same as the number of keys in this table view?
It doesn’t seem to be related to statistics. Even for an empty table, it takes a bit long. However, copying a partitioned table is very fast, which is quite amazing.
What is this warning? show warnings;
From my observation, it’s not bad on my side. Not sure why it’s so slow for you. 48 seconds is quite long. I’m using version v7.5.1. Is there anything that can be optimized for TiDB skew? I’ll check with the development team to see if there’s anything they can look into. 
It feels like a historical version issue. Check how long the GC time is set.
The version is 7.5.1, gc 24h.
To convert this into a partition table, several tasks need to be done: splitting regions (one region per partition), moving data, and adding indexes. If the data volume is large, it will take longer. There might be other operations as well.
Your SQL needs to partition more than 100 times at once, so it is expected to be slow.
We’ll look into whether there is room for optimization later.
Moreover, in terms of MySQL, it is also linear, just a bit faster.
GC 24H is a bit exaggerated, isn’t it? If table changes are frequent, how many MVCC versions would there be?
The table structure has remained unchanged for years. The reason for partitioning this time is that one table has become too large, occupying one-third of the cluster’s storage. Moreover, we only need data from the past six months at most. The script for scheduled cleanup has also impacted the cluster’s performance. Since the new version supports converting to partitioned tables, we are considering partitioning first and then cleaning up directly by partition in the future.
24 hours is not long; I’ve seen ones as long as 72 hours on the forum.