Converting to a partitioned table takes a long time

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 转成分区表耗时很久

| username: Still4

Previously, I checked the DDL progress when adding an index, and I remember that row_count represents the number of processed rows. I’m quite curious why the row_count reaches 300 million when partitioning a table with 4 million rows.

| username: 江湖故人 | Original post link

ALTER TABLE is very common :smiley:

| username: zhaokede | Original post link

348,715,609 rows of data, it should be a very large table, so it’s normal to be slow.

| username: WalterWj | Original post link

  1. There might be an issue with the statistics algorithm.
  2. There might be excessive MVCC.
  3. There might be changes affecting during the DDL execution process.

You can provide feedback in the feedback section.

| username: Billmay表妹 | Original post link

Bug Report
Clearly and accurately describe the issue you have found. Providing any steps to reproduce the issue can help the development team address the problem promptly.
[TiDB Version]
[Impact of the Bug]

[Possible Steps to Reproduce the Issue]

[Observed Unexpected Behavior]

[Expected Behavior]

[Related Components and Specific Versions]

[Other Background Information or Screenshots]
Such as cluster topology, system and kernel version, application app information, etc. If the issue is related to SQL, please provide the SQL statements and related table schema information. If there are critical errors in the node logs, please provide the relevant node log content or files. If some business-sensitive information is inconvenient to provide, please leave your contact information, and we will communicate with you privately.

Please provide feedback in this manner~

| username: forever | Original post link

How long is the GC setting? Will it be related to the old version of the data?

| username: 这里介绍不了我 | Original post link

Is it because the statistics are inaccurate or there are too many historical versions?

| username: zhanggame1 | Original post link

Is the number of keys in the KV database the same as the number of keys in this table view?

| username: Still4 | Original post link

It doesn’t seem to be related to statistics. Even for an empty table, it takes a bit long. However, copying a partitioned table is very fast, which is quite amazing.

| username: WalterWj | Original post link

What is this warning? show warnings;

| username: WalterWj | Original post link

From my observation, it’s not bad on my side. Not sure why it’s so slow for you. 48 seconds is quite long. I’m using version v7.5.1. Is there anything that can be optimized for TiDB skew? I’ll check with the development team to see if there’s anything they can look into. :thinking:

| username: fshowmia | Original post link

It feels like a historical version issue. Check how long the GC time is set.

| username: Still4 | Original post link

The version is 7.5.1, gc 24h.

| username: WalterWj | Original post link

To convert this into a partition table, several tasks need to be done: splitting regions (one region per partition), moving data, and adding indexes. If the data volume is large, it will take longer. There might be other operations as well.

Your SQL needs to partition more than 100 times at once, so it is expected to be slow.

We’ll look into whether there is room for optimization later.

| username: WalterWj | Original post link

Moreover, in terms of MySQL, it is also linear, just a bit faster.

| username: TiDBer_小阿飞 | Original post link

GC 24H is a bit exaggerated, isn’t it? If table changes are frequent, how many MVCC versions would there be?

| username: Still4 | Original post link

The table structure has remained unchanged for years. The reason for partitioning this time is that one table has become too large, occupying one-third of the cluster’s storage. Moreover, we only need data from the past six months at most. The script for scheduled cleanup has also impacted the cluster’s performance. Since the new version supports converting to partitioned tables, we are considering partitioning first and then cleaning up directly by partition in the future.

| username: zhanggame1 | Original post link

24 hours is not long; I’ve seen ones as long as 72 hours on the forum.