Error in Using DM Tool to Synchronize and Aggregate the Daily New Table Creation in MySQL Database

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用dm工具,同步汇集mysql库中每天建新表的情况出错

| username: 有猫万事足

[Test Environment for TiDB]
[TiDB Version]
Release Version: v6.5.1
Edition: Community
Git Commit Hash: 4084b077d615f9dc0a41cf2e30bc6e1a02332df2
Git Branch: heads/refs/tags/v6.5.1
UTC Build Time: 2023-03-07 16:12:08
GoVersion: go1.19.5
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: tikv
[DM Version]
v6.6.0
[Reproduction Path]
The upstream log table adds a new table every day. For example, today a new table log_20230427 is created to store data, and tomorrow it will become log_20230428 to store data.
Because I want to aggregate multiple upstream data into one TiDB table,
the DM route configuration is as follows:
route-Log:
schema-pattern: “China_Log”
table-pattern: “Log_"
target-schema: downStream
target-table: “Log_date”
extract-table:
table-regexp: "Log_(.
)”
target-column: “c_table”
extract-schema:
schema-regexp: “(.)"
target-column: “c_schema”
extract-source:
source-regexp: "(.
)”
target-column: “c_source”
route-log_all:
schema-pattern: “China_Log”
target-schema: downStream
[Encountered Problem: Phenomenon and Impact]
The current problem is that there is no issue with synchronizing the data of the day and the stock data. However, after a day passes and the upstream generates a new table, DM synchronization will report an error similar to the following:
“Message”: “startLocation: [position: (mysql-bin|000001.000001, 4540655), gtid-set: 00000000-0000-0000-0000-000000000000:0], endLocation: [position: (mysql-bin|000001.000001, 4540774), gtid-set: 00000000-0000-0000-0000-000000000000:0]: gen insert sqls failed, sourceTable: China_Log.Log_2023_04_27, targetTable: downStream.Log_date: Column count doesn’t match value count: 20 (columns) vs 23 (values)”,
indicating that the new table has three fewer fields than the downstream summary table, and synchronization cannot continue.
I want to know how to solve this situation?

| username: db_user | Original post link

Is the statement used to create the table “create table if not exists”? Please show the table structure and the SQL of this binary log. Also, provide a screenshot of the DM routing configuration. It seems like some transformations were made.

| username: 有猫万事足 | Original post link

Did not use create table if not exists.
Upstream table SQL:

CREATE TABLE `Log_2023_04_27` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `Uid` int(11) DEFAULT NULL,
  `Type` int(11) DEFAULT NULL,
  `ItemId` int(11) DEFAULT NULL,
  `UserCardId` bigint(15) DEFAULT NULL,
  `CardId` int(11) DEFAULT NULL,
  `MasterUserCardId` bigint(15) NOT NULL DEFAULT '0',
  `MasterCardId` int(11) NOT NULL DEFAULT '0',
  `Exp` int(11) DEFAULT NULL,
  `Level` tinyint(4) DEFAULT NULL,
  `UpdateTime` datetime DEFAULT NULL,
  `BeforeSkillNew` int(11) NOT NULL DEFAULT '0',
  `SkillNew` int(11) NOT NULL DEFAULT '0',
  `SuppleSkill` int(11) NOT NULL DEFAULT '0',
  `Evolution` tinyint(4) NOT NULL DEFAULT '0',
  `WashTime` int(11) NOT NULL DEFAULT '0',
  `ContractCondition` int(11) NOT NULL DEFAULT '0',
  `ForgeProcess` int(11) NOT NULL DEFAULT '0',
  `EquipType` int(11) NOT NULL DEFAULT '0',
  `SeniorWashTimes` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`Id`),
  KEY `usercardlog_uid` (`Uid`),
  KEY `type_index` (`Type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Downstream table SQL:

CREATE TABLE `Log_date` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `Uid` int(11) DEFAULT NULL,
  `Type` int(11) DEFAULT NULL,
  `ItemId` int(11) DEFAULT NULL,
  `UserCardId` bigint(15) DEFAULT NULL,
  `CardId` int(11) DEFAULT NULL,
  `MasterUserCardId` bigint(15) NOT NULL DEFAULT '0',
  `MasterCardId` int(11) NOT NULL DEFAULT '0',
  `Exp` int(11) DEFAULT NULL,
  `Level` tinyint(4) DEFAULT NULL,
  `UpdateTime` datetime DEFAULT NULL,
  `BeforeSkillNew` int(11) NOT NULL DEFAULT '0',
  `SkillNew` int(11) NOT NULL DEFAULT '0',
  `SuppleSkill` int(11) NOT NULL DEFAULT '0',
  `Evolution` tinyint(4) NOT NULL DEFAULT '0',
  `WashTime` int(11) NOT NULL DEFAULT '0',
  `ContractCondition` int(11) NOT NULL DEFAULT '0',
  `ForgeProcess` int(11) NOT NULL DEFAULT '0',
  `EquipType` int(11) NOT NULL DEFAULT '0',
  `SeniorWashTimes` int(11) NOT NULL DEFAULT '0',
  `c_table` varchar(100) COLLATE utf8_general_ci NOT NULL COMMENT 'Upstream table name',
  `c_schema` varchar(100) COLLATE utf8_general_ci NOT NULL COMMENT 'Upstream database name',
  `c_source` varchar(100) COLLATE utf8_general_ci NOT NULL COMMENT 'Upstream data source name',
  PRIMARY KEY (`Id`, `c_table`, `c_schema`, `c_source`) /*T![clustered_index] NONCLUSTERED*/,
  KEY `usercardlog_uid` (`Uid`),
  KEY `type_index` (`Type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci /*T! SHARD_ROW_ID_BITS=3 PRE_SPLIT_REGIONS=2*/;

Routing configuration:

mysql-instances:
- source-id: t1
  meta: null
  filter-rules:
  - filter-01
  column-mapping-rules: []
  route-rules:
  - route-UserCardLog
  - route-UserCardLog_all
  block-allow-list: balist-01
routes:
  route-UserCardLog:
    schema-pattern: "China_Log"
    table-pattern: "Log_*"
    target-schema: downStream
    target-table: "Log_date"
    extract-table:
      table-regexp: "UserCardLog_(.*)"
      target-column: "c_table"
    extract-schema:
      schema-regexp: "(.*)"
      target-column: "c_schema"
    extract-source:
      source-regexp: "(.*)"
      target-column: "c_source"
  route-UserCardLog_all:
    schema-pattern: "China_Log"
    target-schema: card
filters:
  filter-01:
    schema-pattern: "China_*"
    table-pattern: "*"
    events:
    - create database
    - drop database
    - create table
    - create index
    - drop table
    - truncate table
    - rename table
    - drop index
    action: Ignore
block-allow-list:
  balist-01:
    do-dbs:
    - "China_Log"
    ignore-tables: []
    ignore-dbs: []
| username: 有猫万事足 | Original post link

The original routing file was very long. Currently, this table has been moved to a separate task for further observation. It might have been affected by other router/block-allow-list configurations.

| username: 有猫万事足 | Original post link

To add, the downstream table creation did not use “create table if not exists,” but the upstream table creation did use “create table if not exists.” Does this have anything to do with it?

| username: db_user | Original post link

Uh, I’m talking about the upstream. There was a bug with “create table if not exists” a long time ago. You can try it without “if not exists” and see.
dm同步时候的bug - #11,来自 lance6716 - TiDB 的问答社区.

| username: 有猫万事足 | Original post link

That’s the reason. Thank you very much.

| username: 有猫万事足 | Original post link

Subsequent processing can use tiup dmctl binlog-schema list -s source_name taskname db_name table_name_20230427 and tiup dmctl binlog-schema list -s source_name taskname db_name table_name_20230428. By comparing, it can be found that using create table if not exists caused these two binlog schemas to be inconsistent.

If the upstream cannot modify the table creation statement, you can use tiup dmctl binlog-schema update -s source_name taskname db_name table_name_20230428 correct_table_structure.sql.

After updating the binlog schema, synchronization can continue. No data loss was observed.

The issue is thus resolved.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.