In TiDB, which version supports DDL operations to modify field types by only synchronizing metadata changes without synchronizing data changes?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb做ddl修改字段类型,什么版本可以出只同步修改元数据,而不同步修改数据的操作

| username: wluckdog

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path]
Modifying the column type of a table is called Reorg DDL, which will modify both metadata and table data. Modifying table data will cause ticdc to experience delays. In which version does modifying the column type Reorg DDL not cause ticdc delays?

[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: Billmay表妹 | Original post link

Which version are you using now?

| username: wluckdog | Original post link

Version 6.5.0

| username: zhanggame1 | Original post link

There will be a significant delay. I tested modifying the field type of a table with tens of millions of records, changing int to varchar, and it took several minutes.

| username: 春风十里 | Original post link

Is there any database that can perform DDL changes without synchronizing metadata and data?

| username: 连连看db | Original post link

TiDB version 5.1 supports reorg DDL. In version 6.2, TiDB adopts a concurrent framework that allows asynchronous processing of DDL. Version 6.5 optimizes the impact of DDL on TiCDC when synchronizing DML data, executing in the order of DML → DDL → DML, and also supports filtering specified types of DDL events.

| username: wangccsy | Original post link

If the table column type changes, can the original data remain unchanged in type?

| username: 小龙虾爱大龙虾 | Original post link

Data backfill is required for destructive changes :joy_cat:, but it shouldn’t be necessary for non-destructive changes.

| username: Jellybean | Original post link

TiCDC is a tool for obtaining real-time change logs of TiKV row data. If there is a synchronization delay after modifying the data type, it indicates that the modification of the data type has caused a large amount of data changes.

In TiDB, when does modifying the data type result in data changes? That’s right, it happens during lossy changes, such as converting int to varchar. Typically, this involves changes from a larger range to a smaller range or across different data types. You can refer to the official documentation for specific types that are considered lossy changes.

Conversely, this answers your second question: when the modified data type does not result in lossy changes, there will be no changes to the data rows, and thus no significant data modifications. In this case, TiCDC synchronization delays will not occur, such as converting tinyint to int.

| username: dba远航 | Original post link

When modifying the metadata of a table, you can specify the type of the column. If you specify the original type, it will not change.

| username: Kongdom | Original post link

:thinking: Could you please specify the type of modification, from what type to what type?
I think changing the field type without triggering data modification is unlikely to be achievable.

| username: 随缘天空 | Original post link

What is your applicable scenario? There should be prerequisites, right? For example, if the original string is “abc” and you forcefully change the type to int, it definitely shouldn’t be possible to modify it.

| username: wluckdog | Original post link

Changing the timestamp type to datetime type. I suspect there is a bug with the timestamp type. When deleting a large amount of data and having high concurrent transactions, it may lead to data and index inconsistencies.

| username: 烂番薯0 | Original post link

There will be a delay, a delay will occur, but it will continue to sync afterwards, right?

| username: Kongdom | Original post link

If the datetime field does not set precision, the default precision is 0, which only retains up to seconds. In practice, this should be a lossy adjustment. It seems unlikely that the metadata would remain unchanged.

| username: wluckdog | Original post link

The actual type will move the actual data, and CDC will have a delay. When I moved a 10G table, the downstream Kafka data volume surged.

| username: wluckdog | Original post link

You probably haven’t encountered any issues. For a 10G table, the downstream Kafka can handle one to two hundred gigabytes. If you need to modify a very large table, the downstream Kafka can be overwhelmed.

| username: 烂番薯0 | Original post link

Indeed, we haven’t encountered this issue. We are using downstream MySQL and haven’t had to modify data types yet. If what you say is true, then downstream MySQL might also be unable to synchronize.

| username: TIDB-Learner | Original post link

If there is no business data being written at night, you can stop CDC first.
Personally, I feel that the two time types can be losslessly converted by configuration, with timestamp converted to datetime.

Apart from the different storage requirements between DATETIME and TIMESTAMP:
The TIMESTAMP data type converts date and time data to UTC (Coordinated Universal Time) for storage and also converts the data to the current time zone when queried.
The DATETIME data type stores the exact time inputted and returns the same time when queried, without any time zone conversion.
The initial values for DATETIME and TIMESTAMP can be set to null, any timestamp within the valid range, or the current timestamp.

| username: tidb菜鸟一只 | Original post link

Wouldn’t changing varchar(10) to 5 be equivalent to modifying the current field to substr(1,5) of the current field?