[DM] New task configuration option to support whether to create indexes when creating tables

translator_bot · June 22, 2024, 11:23am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【DM】task任务新增配置项，支持建表时是否建索引

| username: fluent

Requirement Feedback
Please clearly and accurately describe the problem scenario, required behavior, and background information to facilitate timely follow-up by the product team.
[Problem Scenario Involved in the Requirement]
At present, when using TiDB, I hope to use it as a data middle platform. It requires full + incremental data synchronization. Due to the addition of many indexes to the business table, the load phase of DM takes a long time.
The current solution is to export the table structure, delete the indexes, manually create the table, and then execute the task.
[Expected Required Behavior]
Can a new configuration item be added to the task to configure whether to create indexes?
[Alternative Solutions]

[Background Information]
Such as which users will benefit from it, and some usage scenarios. Any API design, models, or diagrams would be more helpful.

translator_bot · June 22, 2024, 11:23am

| username: WalterWj | Original post link

You need a DML filter to skip the index addition DDL synchronization?

translator_bot · June 22, 2024, 11:23am

| username: WalterWj | Original post link

events: [“truncate table”, “drop table”] # Which event types to match? Similar configuration can be found here: DM 任务完整配置文件介绍 | PingCAP 文档中心

translator_bot · June 22, 2024, 11:23am

| username: Jellybean | Original post link

If the index already exists when the table is created, it will also be created during synchronization.

If the index is added later to an existing table and you want to filter out these later-added index DDLs, I recall that DM’s filtering operations can be set up for this. For example, you can add some binlog event-level filters to directly filter out the alter table xx add index xxx operations for specified databases and tables. This should meet your needs.

translator_bot · June 22, 2024, 11:23am

| username: fluent | Original post link

Thank you for your reply. During the full synchronization phase in DM, it will export the table structure of all tables, which includes the statements for creating indexes. This causes the full import to be very slow due to the presence of indexes.

From the community, I learned that you can first create the table structure yourself without including indexes, and then import the data. If DM detects that the database and tables already exist, it will not create them again.

The filter you mentioned is used in the sync phase to filter out all index creation operations.

What I am describing is that during DM’s automatic full synchronization, it exports the table structure including indexes. Can we add a switch here to avoid having to manually create the table structure each time?