Execution Time for Modifying Column is Too Long

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: modify column执行时间过长

| username: Holland

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.1
[Reproduction Path] Operations performed that led to the issue
Modified the length of a field in this table
[Encountered Issue: Problem Phenomenon and Impact]
Execution time is too long, and the ROW_COUNT in ddl jobs is much larger than the Row_count in STATS_META.
Table health is 95
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachment: Screenshot/Log/Monitoring]

| username: Fly-bird | Original post link

Is the data volume large?

| username: Holland | Original post link

More than 2.6 billion rows

| username: h5n1 | Original post link

What kind of modify operation are you doing that requires data backfill? This is causing it to be slow.

| username: Holland | Original post link

Modify the length of a field.

| username: 像风一样的男子 | Original post link

This is slow, estimated to take about ten hours, just have to wait.

| username: Holland | Original post link

It has already been running for more than 60 hours, and why is the row count in show ddl jobs much larger than the actual row count?

| username: 像风一样的男子 | Original post link

The table statistics are inaccurate. Run analyze on the table after the DDL is completed.

| username: Holland | Original post link

Okay, for now, we still have to wait, right?

| username: Holland | Original post link

select count(*) is also more than 2.6 billion, the statistical information should be accurate, right?
Enterprise WeChat Screenshot_bba6dba5-4cfd-4708-8a88-ae3676e6eb24

| username: hey-hoho | Original post link

Please send the SQL for the modification.

| username: Kongdom | Original post link

Could it be that records were inserted in the meantime? After all, it has been running for over 60 hours.

| username: TiDBer_小阿飞 | Original post link

The ROW_COUNT() function returns the number of rows affected by the last executed SQL statement. The execution logic of the ROW_COUNT() function is as follows:

  • If the last statement was a DDL statement, the ROW_COUNT() function will return 0. For example, CREATE TABLE, DROP TABLE, etc.
  • If the last statement was an UPDATE, INSERT, DELETE, ALTER TABLE, or LOAD DATA statement, the ROW_COUNT() function will return the number of affected rows.
  • If the last statement was a SELECT statement that returns a result set, the ROW_COUNT() function will return -1.
  • If the last statement was not a SELECT statement that returns a result set, the ROW_COUNT() function will return the number of affected rows. For example: SELECT * FROM t1 INTO OUTFILE 'file_name'.
| username: xingzhenxiang | Original post link

Since when did DDL take so long? I use TiDB, and apart from indexing, I’ve never seen anything take this long.

| username: 路在何chu | Original post link

This is a bit unusual. We modified the fields of 1.1 billion records, and it was completed very quickly.

| username: tidb菜鸟一只 | Original post link

This backfill will generate twice the amount of data for the entire table, so with this operation, modify_column will probably result in about twice the total data volume, which is around 5.2 billion rows.

| username: Holland | Original post link

It’s been more than 5 days, and it still hasn’t finished running. There are only more than 2.6 billion rows in the table, but the current row count is over 8 billion.

| username: Holland | Original post link

ALTER TABLE xxxx_dw.xxxxx_sku
MODIFY COLUMN xxx_price decimal(16, 4) NULL DEFAULT NULL COMMENT ‘Price’ AFTER xxxx_num

| username: Kongdom | Original post link

:joy: Could it be that there is an interruption and retry in the middle, causing the line count to keep accumulating?

| username: Holland | Original post link

It shouldn’t be possible, TiDB was up 5 days ago.