Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: mysql迁移tidb
The upstream database is MySQL 5.7, the downstream TiDB version is v6.5.0, and the migration tool used is DM. Since the upstream data has primary keys and auto-increment, and because TiDB requires the upstream MySQL row format to be in row format during migration, the original MySQL format was changed from mixed to row format. However, after the migration, the total number of rows in the same table in TiDB is more than in MySQL. What could be the reason for this?
How to count the number of rows in a table?
Is it a static table or a dynamic table? Is there continuous writing? You can first create a test static table and observe if it works normally.
Is there more downstream than upstream? Did the upstream have a delete operation that didn’t migrate successfully?
The total number of rows counted by count(1).
Dynamic table, but there are no writes for a period of time. The number of rows in the static table is consistent.
The downstream has more data than the upstream, and there are no delete operations in the upstream. Synchronizing static tables shouldn’t be an issue.
That’s strange. Was the downstream table empty before the migration?
If there are more rows, you need to compare how the data has increased. Is the increase normal, or was the original row count in MySQL inaccurate?
The downstream database and tables were not created before the migration.
Is there control over DML permissions downstream?
All counts are done using count(1), and I don’t know why the row count is higher, but the first and last data match. The row format in MySQL is Dynamic, while in TiDB it is Compact. Not sure if this is the reason.
Full permissions were given.
Do other users also have write permissions besides the migration user?
It is using the root user.
TiDB has a comparison tool that you can use to compare specified tables in TiDB and MySQL.
sync-diff-inspector User Documentation | PingCAP Documentation Center
Yes, but I am currently in the process of full and incremental synchronization, so I can’t use this tool. It’s just that I see the data in MySQL hasn’t increased, but the total number of rows in TiDB is more than in MySQL, which feels a bit strange.
Is there an operation on the MySQL side to disable binlog? There is a session variable that can control it.
What are some? Is it the filtering operation in DM? When DM synchronizes, there is only one operation to specify synchronizing one database, and there are no other operations.
Isn’t your downstream data more than upstream now? The data is inconsistent. If your DM is not specially configured, is it possible that some upstream operations did not write to the binlog, causing the downstream to miss them? MySQL has a variable that allows session-level operations to not write to the binlog. Could someone have used this to operate the database? https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_sql_log_bin