Changes in the Number of Data Rows When Migrating Data from MySQL to TiDB Cluster

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: MySQL迁移数据到TiDB集群数据行数变化

| username: TiDBer_djgos04V

[TiDB Usage Environment] Production Environment
After migrating data from MySQL to the TiDB cluster, using the query command:
SELECT SUM(TABLE_ROWS) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘data_center’;
I found that the number of rows in TiDB is greater than in MySQL. Why is this?

| username: 啦啦啦啦啦 | Original post link

The statistical information is an estimate, which is not accurate. If you migrate from MySQL to MySQL, you will also find inconsistencies between the two sides.

| username: 像风一样的男子 | Original post link

Is “select count(*) from data_center” also not allowed?

| username: TiDBer_vfJBUcxl | Original post link

This should be normal, just use SELECT COUNT(*) in the data table.

| username: TiDBer_djgos04V | Original post link

There are too many tables in the database, it’s impossible to compare them one by one.

| username: Kongdom | Original post link

This is normal. After analyzing the table, this value will be relatively accurate.

| username: 像风一样的男子 | Original post link

Then you should run an analyze on all the tables.

| username: 啦啦啦啦啦 | Original post link

If both sides are static data, writing a script to batch count is faster. If there is continuous incremental data, you can use sync-diff-inspector to compare the data, which is more accurate than directly looking at the data volume.

| username: cassblanca | Original post link

After importing data, you need to recollect statistics by running ANALYZE XXXX for the system metadata to be accurate, or directly run SELECT COUNT(*) FROM [table_name].

| username: RenlySir | Original post link

Use this to compare the data consistency between MySQL and TiDB.

| username: zhanggame1 | Original post link

Apart from count*, other statistics are also inaccurate. You can use TiDB’s official data comparison tool.

| username: tidb菜鸟一只 | Original post link

For migration comparison, use the sync-diff-inspector tool provided by TiDB. It can compare not only the number of rows but also the data content.

| username: 烂番薯0 | Original post link

Migration errors may occur if the characters in the original data differ from those in TiDB. You can use the sync-diff-inspector tool. sync-diff-inspector User Documentation | PingCAP Documentation Center

| username: redgame | Original post link

Indeed, it is inaccurate.

| username: 昵称想不起来了 | Original post link

Statistics can only be used as a reference. In this case, you might consider writing a script to count key tables and randomly select some data for comparison. If the accuracy of the data is very strict, you can consider using the official tool sync-diff-inspector.

| username: 大飞哥online | Original post link

After the import is completed, if the statistics are not yet finished, you can perform an analyze on the imported environment. This way, the queries will be more accurate since the statistics are not updated in real-time.

| username: 胡杨树旁 | Original post link

Write a script to count(*), and if there are discrepancies, perform sync-diff.