Are there best practices for TiDB in high concurrency read scenarios?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB 在高并发读的场景有最佳实践么

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] V6.5.8
[Encountered Problem: Problem Phenomenon and Impact] I only found TiDB 高并发写入场景最佳实践 | PingCAP 文档中心 for write scenarios in the official documentation. Does anyone have best practices for high concurrency read scenarios in TiDB?

| username: zhanggame1 | Original post link

The official documentation probably doesn’t have it. Read here about small table caching to improve performance: Cached Tables | PingCAP Documentation Center

| username: YuchongXU | Original post link

SSD + multiple nodes

| username: changpeng75 | Original post link

The storage engine used by TiDB employs an LSM-Tree structure, which inherently has advantages for writing but is not as strong for reading. Since all write operations are appended, it may require reading multiple versions. Additionally, with the presence of the MVCC mechanism, handling a large proportion of highly concurrent reads is not TiDB’s strong suit.

| username: 小于同学 | Original post link

Haven’t encountered it.

| username: DBAER | Original post link

I am here to learn.

| username: kelvin | Original post link

Here to learn as well.

| username: tidb菜鸟一只 | Original post link

If it’s a hot read, you can enable small table caching or Follower Read. For regular concurrent reads, you can distribute the table’s regions evenly across different TiKV nodes to take advantage of TiKV’s multi-node parallel reading.

| username: residentevil | Original post link

There are no hot reads, but the data volume is very large, and the SQL is relatively simple [but there are many index row retrievals]. However, after deploying 24 KV instances, stress testing found that the unified read pool utilization is very high.

| username: tidb菜鸟一只 | Original post link

Shouldn’t we optimize the SQL? Please share the execution plan of the SQL.

| username: residentevil | Original post link

The SQL execution plan meets expectations. If we want to increase the QPS read, do we have to expand the number of TiKV instances?

| username: redgame | Original post link

It’s not universal; it depends on the hardware environment and other factors.

| username: TiDBer_aaO4sU46 | Original post link

Perform SQL optimization

| username: tidb菜鸟一只 | Original post link

You can also optimize the UnifyReadPool thread pool.

If the data is indeed balanced, the SQL execution plan is fine, and each scan does not involve much data, but there are a lot of concurrent SQLs, then you might really need to expand the resources of the TiKV nodes or add more TiKV nodes…
If there are OLAP-type requests (report-type requests) that scan a large amount of data, you can also deploy TiFlash nodes to alleviate the pressure on the TiKV nodes…

| username: 有猫万事足 | Original post link

This is a pretty good testing method. We should continue to increase concurrency until the SQL execution time exceeds expectations or gets stuck at the expected time. Compare the baseline concurrency value to see if it meets the requirements. If it doesn’t, observe the limit value of the unified read pool or other indicators to see which one has reached the bottleneck, and then consider optimization.

| username: TiDBer_5cwU0ltE | Original post link

Every company has its own requirements for high concurrency. If the current setup can meet these needs, there is no need to pursue QPS/TPS metrics excessively.