[Analyze] Why are analyze options persisted in internal tables instead of being stored in the table schema?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: [Analyze] 为啥analyze options持久化的时候是写内部表而不是存在表的schema里面呢?

| username: TiDBer_D7483dYr

I see it was introduced in *: persist analyze options for manual and auto analyze by chrysan · Pull Request #30939 · pingcap/tidb · GitHub.

  1. Wouldn’t it be more convenient to store it in the schema? This way, we wouldn’t have to worry about the GC (if the table is deleted or renamed, we need to delete the corresponding configuration from the mysql.analyze_options internal table). Is there any other consideration?

  2. The comment for the getAdjustedSampleRate function mentions that the paper “Random sampling for histogram construction: how much is enough?” has proven that no matter how large the database is, once the sample size reaches a certain value (around 100k), the accuracy is sufficient. Does this mean that the number of buckets, the number of top-n, etc., also have an upper limit? If so, why not directly set the default to this upper limit, so we don’t have to adjust it?

| username: xfworld | Original post link

From this perspective, DBAs are going to be out of a job :rofl:

| username: TiDBer_21wZg5fm | Original post link

There is a certain gap between theory and practice.

| username: 像风一样的男子 | Original post link

I suggest raising an issue.

| username: 友利奈绪 | Original post link

I feel it will take some time to achieve.

| username: dba远航 | Original post link

Very comprehensive consideration.

| username: 小于同学 | Original post link

From this perspective, DBAs are going to lose their jobs :rofl: