Manual analyze table takes too long

translator_bot · June 21, 2024, 12:59am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 手动analyze table 耗时过长

| username: 像风一样的男子

I have a large table here, and manually analyzing it took 16096.221 seconds. Is there any way to optimize this?

translator_bot · June 21, 2024, 12:59am

| username: RenlySir | Original post link

Which version of TiDB is it? How large is the table data? Is this table a partitioned table?

translator_bot · June 21, 2024, 12:59am

| username: 像风一样的男子 | Original post link

With tens of billions of data, not partitioning it might actually be slower if partitioned.

translator_bot · June 21, 2024, 12:59am

| username: Kongdom | Original post link

Enable fast analysis?

translator_bot · June 21, 2024, 12:59am

| username: TiDBer_jYQINSnf | Original post link

I want to know if this analyze can be executed incrementally, or executed in the background like DDL, because it takes so long to complete. If the connection is lost due to network reasons, will it cause the previous execution to be wasted?

translator_bot · June 21, 2024, 12:59am

| username: Kongdom | Original post link

Normally, the version is marked here.

translator_bot · June 21, 2024, 12:59am

| username: tidb菜鸟一只 | Original post link

Refer to here: 统计信息简介 | PingCAP 文档中心

I think for particularly large tables, data should be collected proportionally. For example, WITH 0.1 SAMPLERATE only collects 10% of the data.

Additionally, I suggest modifying the concurrency of analyze. Increase it appropriately without affecting production applications.

translator_bot · June 21, 2024, 12:59am

| username: RenlySir | Original post link

As mentioned above, adjusting parameters can speed up the analyze process, but it will have an impact on the business.
Increasing the number of TiKV nodes can also appropriately speed it up.
Upgrading to v6.5.8 will also improve the analyze speed.
It is recommended to upgrade.

translator_bot · June 21, 2024, 12:59am

| username: zhaokede | Original post link

For large tables with frequent data additions and deletions, slow optimization is normal. If it were faster, it could impact performance.

translator_bot · June 21, 2024, 12:59am

| username: dba远航 | Original post link

When analyzing, adding a sampling ratio will certainly affect the optimizer’s choice of algorithm.

translator_bot · June 21, 2024, 12:59am

| username: TiDBer_5cwU0ltE | Original post link

You can increase the ANALYZE concurrency and allocate more resources to the analyze operation. Reference link:

translator_bot · June 21, 2024, 12:59am

| username: DBAER | Original post link

Check the documentation, there is a concurrency parameter.

translator_bot · June 21, 2024, 12:59am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.