Ask tiflash | Questions about MPP task and stage

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ask tiflash | 关于 MPP task 和 stage 的疑问

| username: BigC

【TiDB Usage Environment】
Production Environment
【TiDB Version】
6.5
【Reproduction Path】
N/A
【Encountered Issues: Problem Phenomenon and Impact】
TiFlash Documentation - Executor Thread Model, contains a section:
“The TiDB optimizer will generate an MPP plan for this query based on rules and cost. Each MPP plan will be divided into multiple stages, and each stage will be instantiated into several MPP tasks.”
Questions:

  1. What is the basis for dividing into multiple stages?
  2. When instantiated into several MPP tasks, what is the internal mechanism for calculating the actual number of tasks? Are they evenly distributed to all TiFlash nodes?
    【Resource Configuration】
    【Attachments: Screenshots/Logs/Monitoring】
| username: Billmay表妹 | Original post link

Regarding your questions:

  1. What is the basis for dividing into multiple stages?

According to the documentation, the MPP plan is divided into multiple stages based on data distribution and computational complexity. Specifically, the data distribution in each stage should be as even as possible, and the computational complexity should be balanced.

  1. When instantiated into several MPP tasks, how is the number of “several” calculated? Are they evenly distributed to all TiFlash nodes?

According to the documentation, each stage is instantiated into several MPP tasks, and the specific number depends on the data distribution and computational complexity. These MPP tasks are evenly distributed across all TiFlash nodes for computation. In TiFlash, each node has a TiFlash Proxy process responsible for receiving query requests from TiDB and distributing them to multiple nodes within TiFlash for computation. Therefore, every TiFlash node participates in the MPP computation.