Sync-diff-inspector executing SHOW STATS_BUCKETS WHERE db_name= ? AND table_name= ? takes 7 hours

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sync-diff-inspector 执行SHOW STATS_BUCKETS WHERE db_name= ? AND table_name= ?耗时7小时

| username: 刘亚欣1

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.0
[Reproduction Path] Migration from MySQL to TiDB, in a scenario of merging databases, migrating data from multiple MySQL instances to a single TiDB instance. The migration proceeds normally. The number of tables migrated to TiDB is enormous, nearly 900,000, but each table contains relatively little data, with most being empty tables. During the validation process using sync-diff-inspector, it takes an excessively long time.
[Encountered Problem: Phenomenon and Impact]
During the validation process using sync-diff-inspector, the command SHOW STATS_BUCKETS WHERE db_name= ? AND table_name= ?; is automatically executed to obtain statistical information. However, this command executes very slowly, taking up to 7 hours, making it impossible to use the tool for validation properly.
[Resource Configuration] The configuration is relatively high, with neither CPU nor memory being fully utilized.
[Attachments: Screenshots/Logs/Monitoring]
Slow log:


SHOW PROCESSLIST:

| username: Billmay表妹 | Original post link

  1. It is recommended to first check the slowlog to see the slow SQL running in sync-diff at that time and analyze the situation.
  2. Adjust the chunk-size to 200,000 to 500,000 and try again. (Pay attention to CPU usage)
| username: Billmay表妹 | Original post link

The topic of this post looks similar to the situation you encountered, you can refer to it~

| username: 刘亚欣1 | Original post link

Okay, thank you~. By using show processlist, you can see the currently running slow SQL, which are all SHOW STATS_BUCKET statements, while the slowlog records those that have already been executed.

Currently, forcing the interruption of running slow SQL through KILL TIDB can solve this problem. However, after executing KILL, the connection is not immediately terminated and takes a long time. What is the main reason for this?

| username: yilong | Original post link

Is executing SHOW STATS_BUCKETS in TiDB also very slow? Could it be that there is too much statistical information?

| username: 刘亚欣1 | Original post link

It’s very slow. How can I check if there is too much statistical information? How should I handle it?

| username: yilong | Original post link

  1. Based on your description “the table data is huge, nearly 900,000 tables”, it should be that there are too many tables. As a result, there will be more bucket information. May I ask if you have seen how many bucket data there are currently?
    常规统计信息 | PingCAP 文档中心.
  2. Is the db_name and table_name seen in the processlist constantly changing? Or is it stuck on a certain table?
| username: 刘亚欣1 | Original post link

  1. By checking the mysql.stats_buckets table, the total number of rows is: 207356
    image
  2. In the current phenomenon, show processlist does not show status_buckets where …

    Looking at the logs, it gets stuck here and does not update