How does DDL ensure execution efficiency?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DDL是怎么保证执行效率的

| username: 小于同学

How does DDL ensure execution efficiency?

| username: zhanggame1 | Original post link

Which specific type of DDL are you referring to?

| username: zhaokede | Original post link

Online DDL efficiency is a holistic matter; specific issues require specific analysis. The original poster should describe the problem more clearly.

| username: TiDBer_H5NdJb5Q | Original post link

I have the same question. Previously, adding a column to a large table in MySQL would directly lock the table, and migrating data was also relatively slow, affecting queries. I understand that TiDB’s online DDL won’t affect queries, but will there still be data migration issues that cause adding a column to be slow?

| username: 小于同学 | Original post link

Add a new index.

| username: lemonade010 | Original post link

According to whether the execution blocks user operations, DDL statements can be divided into:

  • Offline DDL statements: When the database receives a user’s DDL statement, it first locks the database object to be modified, then executes the metadata change. During the execution of the DDL, it will block user operations from modifying the data.
  • Online DDL statements: When the database executes a DDL statement, it uses certain methods to ensure that the DDL execution does not block user operations. It also ensures that user operations can submit modifications during the DDL execution, maintaining the correctness and consistency of the data of the corresponding object.

According to whether the DDL operation needs to manipulate the data included in the target object, DDL statements can be divided into:

  • Logical DDL statements: These usually only modify the metadata of the database object without processing the data stored in the object being changed, such as changing the table name or column name. In TiDB, logical DDL statements are also known as General DDL. The execution time of General DDL is usually short, taking only tens of milliseconds or a few seconds. Executing such DDL statements consumes almost no system resources and therefore does not affect the business load.
  • Physical DDL statements: These not only modify the metadata of the object being changed but also modify the user data stored in the object. For example, creating an index for a table requires changing the table definition and performing a full table scan to build the newly added index. In TiDB, physical DDL is referred to as Reorg DDL (Reorg stands for Reorganization). Currently, physical DDL only includes ADD INDEX and lossy column type changes (e.g., changing from INT to CHAR type). The characteristics of physical DDL are that it takes a long time to execute, and the execution time is related to the amount of data in the table, machine configuration, and business load. Executing physical DDL affects the business load in two ways. On one hand, it needs to read data from TiKV and write new data, consuming TiKV’s CPU and I/O resources. On the other hand, the TiDB node where the DDL Owner is located or the TiDB node scheduled by the TiDB distributed execution framework to execute the ADD INDEX task needs to perform the corresponding calculations, consuming TiDB’s CPU resources.
| username: 友利奈绪 | Original post link

Are you talking about adding a new index, changing or adding a field, or creating a table?

| username: zhaokede | Original post link

Adding an index has almost no impact.

| username: Sword | Original post link

TiDB DDL operations have no impact on business.

| username: zhh_912 | Original post link

Offline DDL statements will first lock the database object to be modified, then execute the metadata changes. During the execution of the DDL, it will block user operations from modifying the data.

| username: yytest | Original post link

Is it online DDL or offline DDL?