[TiDB Usage Environment] Production Environment
[TiDB Version] V6.5.8
[Encountered Problem: Problem Phenomenon and Impact] I only found TiDB 高并发写入场景最佳实践 | PingCAP 文档中心 for write scenarios in the official documentation. Does anyone have best practices for high concurrency read scenarios in TiDB?
The storage engine used by TiDB employs an LSM-Tree structure, which inherently has advantages for writing but is not as strong for reading. Since all write operations are appended, it may require reading multiple versions. Additionally, with the presence of the MVCC mechanism, handling a large proportion of highly concurrent reads is not TiDB’s strong suit.
If it’s a hot read, you can enable small table caching or Follower Read. For regular concurrent reads, you can distribute the table’s regions evenly across different TiKV nodes to take advantage of TiKV’s multi-node parallel reading.
There are no hot reads, but the data volume is very large, and the SQL is relatively simple [but there are many index row retrievals]. However, after deploying 24 KV instances, stress testing found that the unified read pool utilization is very high.
You can also optimize the UnifyReadPool thread pool.
If the data is indeed balanced, the SQL execution plan is fine, and each scan does not involve much data, but there are a lot of concurrent SQLs, then you might really need to expand the resources of the TiKV nodes or add more TiKV nodes…
If there are OLAP-type requests (report-type requests) that scan a large amount of data, you can also deploy TiFlash nodes to alleviate the pressure on the TiKV nodes…
This is a pretty good testing method. We should continue to increase concurrency until the SQL execution time exceeds expectations or gets stuck at the expected time. Compare the baseline concurrency value to see if it meets the requirements. If it doesn’t, observe the limit value of the unified read pool or other indicators to see which one has reached the bottleneck, and then consider optimization.
Every company has its own requirements for high concurrency. If the current setup can meet these needs, there is no need to pursue QPS/TPS metrics excessively.