Modify Configuration Parameters to Accelerate Query Efficiency

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 修改配置参数,加快查询效率

| username: 小跑跑泡

The TiDB cluster has been deployed with 3 TiDB, 3 TiKV, 3 TiFlash, and 3 PD servers. The daily data increment is 50 million, with 23 CPU cores and a current usage rate of about 40%. The disk is a 10TB SSD. To speed up query efficiency, the global variable has been modified.
SET GLOBAL tidb_distsql_scan_concurrency = 23;

Question 1: Is this modification correct? It should speed up the query, right?
Question 2: Are there any other global parameter optimizations that can improve query efficiency?
Question 3: Now I want to add servers. I think it’s not good to deploy all services on one server and want to deploy them separately. The data retention period is 3 years. Do you have any good service configuration suggestions?
Question 4: The current data volume is 400 million. Sometimes the query list can be returned in seconds, but sometimes it is very slow for the same SQL, and sometimes there is a CPU spike. Only TiDB is deployed on the server. Why is this happening?


Currently, there are only four machines, and the deployment nodes are as shown in the figure.
There are now 7 servers with the same configuration. How should they be deployed?

| username: cy6301567 | Original post link

I need to use both AP and TP, and TiKV is 32. How should I optimize it?

| username: 有猫万事足 | Original post link

For hybrid deployment, it is essential to ensure resource isolation.
Pay attention to adjusting the parameters in the following document:

Question 1: This only increases the concurrency of the scan operation. It will only be used if there is a scan in the execution plan. Not all execution plans end with a scan. So you can’t expect this parameter to speed up all queries.

Question 2: You haven’t reached this step yet. Adjust the hybrid deployment parameters according to the document first to ensure stability. Otherwise, the cluster might crash when the business volume increases.

Question 3: If you are going to change the topology, it is strongly recommended that TiKV/TiFlash each have their own dedicated machine, while PD and TiDB can be placed together.

Question 4: It requires specific analysis based on the situation, especially the execution plan. Without this, I can only make a guess. The guess looks like insufficient I/O. :rofl:

| username: zhanggame1 | Original post link

Are all these components reading and writing to the same hard drive now?

| username: redgame | Original post link

Question 4, the issue of fluctuating speeds, we’ve experienced it before. It depends on the overall load on the database.

| username: cassblanca | Original post link

In a mixed deployment, a 10T SSD is used. There are no performance issues during sequential read and write operations, but when there are more random read and write operations, hotspots will occur.

| username: tidb菜鸟一只 | Original post link

If there are a total of 7 machines, it is recommended to have 3 TiDB+PD (with NUMA resource isolation), 3 TiKV, and 1 TiFlash.
If there are a total of 11 machines, it is recommended to have 3 TiDB (one of which can be replaced with TiKV) + 3 PD, 3 TiKV, and 2 TiFlash.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.