How to Understand and Optimize the Concurrency of TiFlash?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何理解TiFlash 的并发数并进行调优?

| username: wzf0072

【TiDB Usage Environment】Production Environment
【TiDB Version】v6.5.2
【Reproduction Path】Operations performed that led to the issue
【Encountered Issue: Problem Phenomenon and Impact】

How to understand and tune the concurrency of TiFlash?
TiFlash computation concurrency: tidb_opt_tiflash_concurrency_factor
Maximum concurrency for request execution in TiFlash: tidb_max_tiflash_threads
The system frequently experiences TiFlash Request Duration delays, causing a large number of SQL executions to wait until they reach the maximum execution time and timeout.
tidb_max_tiflash_threads introduced from version v6.1.0

  • Scope: SESSION | GLOBAL
  • Persisted to cluster: Yes
  • Type: Integer
  • Default value: -1
  • Range: [-1, 256]
  • Unit: Threads
  • Maximum concurrency for request execution in TiFlash. The default value is -1, indicating that this system variable is invalid. 0 means the value is automatically set by the TiFlash system.

tidb_opt_tiflash_concurrency_factor

  • Scope: SESSION | GLOBAL
  • Persisted to cluster: Yes
  • Type: Float
  • Range: [0, 2147483647]
  • Default value: 24.0
  • Indicates the concurrency calculated by TiFlash. This variable is used internally by the cost model and is not recommended to modify.

【Resource Configuration】

【Attachments: Screenshots/Logs/Monitoring】
Screenshot during fault period 1:


Screenshot during fault period 2:


SQL execution timeout:

| username: Billmay表妹 | Original post link

The concurrency of TiFlash refers to the number of query requests sent to TiFlash simultaneously in TiDB. In the TiDB configuration file, you can set the concurrency of TiFlash using the tidb_opt_tiflash_concurrency_factor parameter. The default value of this parameter is 24, meaning TiDB can send 24 query requests to TiFlash simultaneously.

Additionally, the execution concurrency of each request in TiFlash can be configured using the tidb_max_tiflash_threads parameter. The default value of this parameter is 16, meaning each request can use up to 16 threads for concurrent execution.

When the system experiences TiFlash Request Duration delays, it may be due to the following reasons:

  1. Insufficient TiFlash resources: Both the concurrency of TiFlash and the execution concurrency of each request consume system resources. If the resources (such as CPU, memory, disk) of the TiFlash node are insufficient, it may lead to request delays. You can monitor the resource usage of the TiFlash node to determine if there is a resource bottleneck and adjust resources as needed.

  2. Heavy query load: If the TiFlash node receives a large number of query requests simultaneously, exceeding its processing capacity, it will also lead to request delays. You can monitor the query load of the TiFlash node to determine if there is an overload situation and consider optimizing queries or increasing the number of TiFlash nodes to share the load.

For tuning TiFlash request delays, you can consider the following measures:

  1. Adjust the concurrency of TiFlash: Based on the actual situation, appropriately adjust the value of the tidb_opt_tiflash_concurrency_factor parameter to increase or decrease the concurrency of query requests sent from TiDB to TiFlash.

  2. Adjust the execution concurrency of TiFlash requests: Based on the actual situation, appropriately adjust the value of the tidb_max_tiflash_threads parameter to increase or decrease the execution concurrency of each request.

  3. Increase the number of TiFlash nodes: If the load on the TiFlash node is too heavy, consider increasing the number of TiFlash nodes to improve overall processing capacity.

  4. Optimize query statements: For query statements that frequently access TiFlash, you can optimize the query statements to reduce the amount of data and computation, thereby improving query performance.

Please note that tuning TiFlash requires adjustments based on specific system configurations and load conditions. It is recommended to perform performance analysis and monitoring before tuning to better understand system bottlenecks and optimization directions.

I hope this information is helpful to you. If you have any further questions, please feel free to ask.

| username: wzf0072 | Original post link

My thoughts are now clear, thank you very much :dizzy:

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.