Performance Optimization Configuration Issues

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 性能优化配置相关问题

| username: TiDBer_GEk1m6Na

【 TiDB Version】v7.5

I have recently been studying some configuration parameters of TiDB to conduct performance testing. After understanding TiDB’s storage architecture, there is a configuration parameter that I am not sure if it can be set.

For example, in a query request, TiDB needs to access PD and TiKV to retrieve a batch of data from the table. I want to know if there is a parameter that can set the size of the batch of data retrieved each time.

I found a parameter called max-batch-size, but its effect seems to be different from what I imagined.

| username: TiDBer_jYQINSnf | Original post link

This is a batch of RPCs. Multiple RPC requests are grouped together to reduce the number of round trips between TiDB and TiKV. This increases throughput but also increases latency.
For tuning, you can refer to this series of documents.

| username: TiDBer_GEk1m6Na | Original post link

Are there any other parameters that can be set to control the maximum amount of data that Tidb can retrieve in a single request? Theoretically, for a large table with tens of millions of rows, if only 1024 rows are retrieved at a time, the number of network requests would be too high.

| username: TiDBer_GEk1m6Na | Original post link

Mainly want to understand some configurations and trade-offs of TiDB under the storage-compute separation architecture.

| username: Soysauce520 | Original post link

The official recommendation is to keep the default settings. So far, I haven’t heard of anyone needing to change this. It’s basically all about separating storage and computation.

| username: zhanggame1 | Original post link

Stop tinkering with it; adjusting parameters in version 7.5 probably won’t have any noticeable effect.

| username: RenlySir | Original post link

From the SQL layer, optimization is actually the most effective. Appropriate indexing, good use of computation pushdown, and using TiFlash for complex SQL can basically solve more than 80% of the problems.

| username: TiDBer_jYQINSnf | Original post link

This is equivalent to achieving your goal. TiDB communicates with TiKV sequentially via gRPC. If you increase the gRPC batch size and send more gRPCs together, it effectively reduces the communication volume.

Are there other parameters that can be set to determine the maximum amount of data TiDB can fetch in a single request? Theoretically, for a large table with tens of millions of rows, fetching only 1024 rows at a time would result in too many network requests.

| username: TiDBer_ok0VXN1s | Original post link

User Response time = Service time + Queuing delay + Coherency delay

| username: TiDBer_rvITcue9 | Original post link

Checking in to take a look.

| username: GreenGuan | Original post link

My personal understanding is that it is inappropriate to set it based on the number of data entries. If a single entry contains a blob field that exceeds the limit, wouldn’t it be impossible to retrieve it forever?

| username: TIDB-Learner | Original post link

TiDB 7.5 performance optimization is so excellent now!!?? :star_struck: Does parameter tuning no longer work or is it more noticeable?

| username: DBAER | Original post link

Actually, I didn’t understand. Is it to reduce the impact of network interactions on performance, or to improve the performance of batch writes?

| username: zhanggame1 | Original post link

There are two default installation designs for databases. One is to occupy the least resources initially and then increase as needed, such as MySQL. TiDB and Oracle SQL Server, on the other hand, are designed to occupy most of the server’s resources from the start. Therefore, if components are separated and not mixed in deployment, there is no need to change many parameters for optimization in higher versions. If you encounter specific issues, such as slow index creation or slow table analysis, you can adjust accordingly.

| username: TiDBer_GEk1m6Na | Original post link

There might be some improvement in concurrent performance. However, the response to a single request is unlikely to be significantly affected. Some customers write poor SQL, such as group and sort operations that require fetching table data. As mentioned by others, optimization must be done at the SQL level. For example, fetching 1024 rows of data in a batch requires sending many data retrieval requests to complete the data fetching.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.