When DM synchronizes upstream data tables to the downstream database, can the table names be uniformly specified with a prefix or suffix in bulk?

translator_bot · June 22, 2024, 5:22am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM 同步上游数据表到下游数据库中时，表名是否可以统一批量指定前缀或后缀？

| username: TiDBer_mvrpvfaN

When DM synchronizes upstream data tables to the downstream database, can the table names be uniformly specified with a prefix or suffix in bulk?
For example: Incrementally synchronize all tables from the upstream database A to the downstream database B through DM.
Source database table names: A.a, A.B, A.c
Target database table names: B.ods_aliy_a, B.ods_aliy_B, B.ods_aliy_c

At the same time, is it possible to automatically synchronize a newly added table A.d in the source database to the downstream table B.ods_aliy_d without needing to configure table-level mapping parameters in DM?

translator_bot · June 22, 2024, 5:22am

| username: Anna | Original post link

Refer to this: MySQL批量修改数据表前缀-CSDN博客

translator_bot · June 22, 2024, 5:22am

| username: 有猫万事足 | Original post link

It is recommended to casually learn some language and automatically generate the required routes rules. Map them however you want.

If there are multiple homogeneous databases upstream that need to be processed, it becomes even more convenient.

Later on, you can also open a dm-open-api to manage sources and tasks.
If you can write some code to manage DM with this, you will find it much more enjoyable than using dmctl.

translator_bot · June 22, 2024, 5:22am

| username: TiDBer_mvrpvfaN | Original post link

Thank you, but I was referring to renaming the table during DM synchronization.

translator_bot · June 22, 2024, 5:22am

| username: TiDBer_mvrpvfaN | Original post link

Thank you for providing the solution. Currently, we are using this for simplicity and convenience, as we do not have enough personnel to conduct further research and secondary development.

translator_bot · June 22, 2024, 5:22am

| username: 有猫万事足 | Original post link

Alright.
If the upstream databases do not exceed 5 and the synchronized tables do not involve any sharding/merging or require consideration of which DDL or DML needs to be applied downstream, the subsequent maintenance of this DM is relatively easy. It is basically manageable if you are the developer of the upstream.

Otherwise, trust me, writing these route rules is really a tedious/manual task. It won’t be so smooth running it later. The more complex the sharding strategy of the upstream database, the more irregular the usage of the database, the more unexpected situations will occur. It’s a simple logic: if someone is dumping garbage upstream, don’t expect to have a peaceful time downstream.

translator_bot · June 22, 2024, 5:22am

| username: TiDBer_mvrpvfaN | Original post link

Yes, that’s why I consider using TiDB as a data warehouse. When using DM to synchronize business data to the ODS layer of the data warehouse, I face the issue of strictly following development standards (hierarchical naming conventions). If I strictly follow the data warehouse hierarchical naming conventions, I would need to add prefixes like ods_sourceDB_sourceTable. Currently, this means mapping the table names for each table, and when new tables are added upstream, I would also need to manually add the mapping relationships.

Therefore, the cost I’m considering now is to pass the incremental data at the database level. For the few partitioned tables involved, I will merge them and then separately configure custom mappings (since there aren’t many partitioned tables, this can be controlled).

So, between strict standards and quick development, in the absence of a better solution, I first considered using DM to extract the entire database “as is.”

translator_bot · June 22, 2024, 5:22am

| username: tidb菜鸟一只 | Original post link

For other source databases, if the data is designed with sharding and partitioning, consider configuring routing rules to merge upstream data into specified downstream tables. Specify prefixes or suffixes for batch table names. Currently, the only solution I can think of is to configure individual routing rules for each table.

translator_bot · June 22, 2024, 5:22am

| username: 有猫万事足 | Original post link

It looks like you just need to change the table names and generate the router rules in bulk for the entire extraction. However, you need to pay extra attention to whether the upstream QPS/TPS is high. If the write concurrency is already high or the table data volume is large, you need to be careful to prevent write hotspots during synchronization and import stages in the downstream table structure.

If you don’t prepare the downstream table structure in advance and just move the MySQL table structure downstream, you’ll have to start over if write hotspots occur.

Basically, your task is relatively simple. Writing router rules (manual work) and planning the high-concurrency write table structure to prevent write hotspots should be no problem.

translator_bot · June 22, 2024, 5:22am

| username: TiDBer_mvrpvfaN | Original post link

Yes, we are currently considering using the solution you mentioned.

translator_bot · June 22, 2024, 5:22am

| username: redgame | Original post link

Yes, when using DM to synchronize upstream data tables to the downstream database, you can specify the table name prefix or suffix in batches.

translator_bot · June 22, 2024, 5:22am

| username: Anna | Original post link

Alright, let’s take another look.

translator_bot · June 22, 2024, 5:22am

| username: Hacker007 | Original post link

Yes, you can specify the targetTable, but if there are many tables, it will be quite challenging as each one needs to be configured.

translator_bot · June 22, 2024, 5:22am

| username: ljluestc | Original post link

Yes, you can use the DM (Data Migration) tool to specify prefixes or suffixes for table names during the synchronization process. DM allows you to configure table name mapping between the source and target databases.

To achieve the desired result, you can use the routing rules configuration in DM to define the table name mapping. Here is an example configuration for your scenario:

route-rules:
  - schema-pattern: "A"
    table-pattern: ".*"
    target-schema: "B"
    target-table: "ods_aliy_$1"

In this example, the schema pattern is set to “A” to match the source schema name. The table pattern is set to “.*” to match all tables in the source schema. The target-schema is set to “B” to specify the target schema name, and the target-table is set to “ods_aliy_$1” to specify the target table name, where the “$1” placeholder represents the matching table name from the source.

With this configuration, all tables in the “A” schema will be synchronized to the “B” schema in the target database with the prefix “ods_aliy_” added to the table names.

When you add a new table “A.d” in the source database, DM will automatically synchronize it to the target database with the name “B.ods_aliy_d” based on the configured table name mapping.

Make sure to include this configuration in the DM configuration file (e.g., dm-worker-1.toml) and restart the DM Worker to apply the changes.

Note: The actual DM configuration file syntax may vary depending on the version of DM you are using. The example provided above is based on DM 2.0 syntax. Please refer to the official DM documentation for the specific version you are using to ensure accurate configuration.

translator_bot · June 22, 2024, 5:22am

| username: ajin0514 | Original post link

This is a very good case.

translator_bot · June 22, 2024, 5:22am

| username: 飞-田鼠 | Original post link

How should this be configured? According to your configuration, it doesn’t work when configuring in dm_v5.4.0. Does DM not support this configuration?

translator_bot · June 22, 2024, 5:22am

| username: zhh_912 | Original post link

Yes, this is currently the general approach.