How is statistical information (Statistics) synchronized between multiple TiDB instances?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 多个TiDB之间的统计信息(Statistic)是如何同步的?

| username: TiDBer_0FIjSxWX

[TiDB Version] Latest version or older versions
[Encountered Problem: Phenomenon and Impact]
I have read multiple TiDB-related documents and blogs, but I haven’t seen how the statistics (Statistic) of multiple TiDBs are propagated or synchronized between TiDB clusters. Could you please clarify:

  1. If I deploy multiple TiDBs, will the statistics (Statistic) of each TiDB be synchronized or exchanged between the clusters?
  2. If the statistics are synchronized or propagated between the clusters, how is it implemented?
  3. Will there be discrepancies in the statistics between multiple TiDBs? If so, will it cause any adverse effects?
| username: tidb菜鸟一只 | Original post link

The table’s statistics are stored in TiKV.

  1. If there are multiple TiDB servers, their statistics are read from TiKV, so there is no synchronization issue.
  2. No synchronization is needed because everything is persisted in TiKV.
  3. There might be deviations. First, each TiDB server loads statistics into memory upon startup, and if they start at different times, the loaded statistics might differ. Additionally, even if the statistics are in memory, sometimes the latest statistics need to be loaded from TiKV, which can occasionally fail due to timeout. This means that different TiDB servers might not always successfully load the statistics, leading to different execution plans. This could result in SQL execution not following the most optimized plan. In such cases, manual intervention might be necessary.
| username: TiDBer_0FIjSxWX | Original post link

Thank you very much!!!

| username: zhanggame1 | Original post link

  1. Statistics will be persisted to TiKV, and TiDB nodes will synchronize from TiKV.

  2. There are deviations, and startup loading is relatively slow. Additionally, some people have encountered issues with different execution plans on different TiDB nodes.

| username: TiDBer_0FIjSxWX | Original post link

Thank you very much!!

| username: TiDBer_aKu9dgpb | Original post link

  1. If the statistics are synchronized or propagated between clusters, how is it implemented?
    When the tidb-server starts, it pulls the cached statistics from kv to tidb. Subsequent synchronization updates occur during the CBO process.
| username: TiDBer_Lee | Original post link

Table-level statistics import and export can be performed.

| username: wangkk2024 | Original post link

Learned.

| username: kelvin | Original post link

Learned.

| username: zhang_2023 | Original post link

TiDB does not have it, TiKV synchronizes directly.

| username: 小于同学 | Original post link

Statistics will be persisted to TiKV, and TiDB nodes will synchronize from TiKV.

| username: QH琉璃 | Original post link

Learned a lot, thank you!

| username: jiayou64 | Original post link

Learned.

| username: TIDB-Learner | Original post link

You can refer to the documentation 常规统计信息 | PingCAP 文档中心

| username: TiDBer_QYr0vohO | Original post link

Learned.

| username: dba远航 | Original post link

Statistics will be persisted to TiKV, and TiDB nodes will read from TiKV.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.