Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 扩容tidb后,sql只占用其中一个内存,导致内存不足
[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
Executing a SQL statement results in an error: “Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query.” Adjusting the tidb_mem_quota_query parameter to the maximum still doesn’t work. After expanding TiDB, the tidb_mem_quota_query was increased again.
[Encountered Issue: Problem Phenomenon and Impact]
After expansion, executing SQL shows that the load is only on the TiDB instances from before the expansion. The newly expanded instances have no CPU usage. Can any experts help with this?
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
The root cause is the poor SQL performance, optimize the SQL.
After the expansion, is the data balanced?
Additionally, what is the maximum memory for the machine? The following configuration will also affect it: TiDB 内存控制文档 | PingCAP 文档中心
Why does this graph seem like one TiDB is slacking off and not working?
Are there two TiDBs, but only one instance was scaled up?
Do you have load balancing in front? The client connects to whichever TiDB, and that TiDB works.
Is it connected through HAProxy?
It feels like there is no load balancing at all, and it is very likely that it was originally connected directly to the TiDB server.
Without considering load balancing, just normally added a TiDB instance.
A statement will only be executed on one TiDB, right? It won’t be distributed to all TiDB nodes, right?
Without load balancing, the execution of SQL on a specific TiDB instance is entirely determined by the client.
First, if you are using HAProxy, you need to add the newly expanded nodes to the HA configuration so that your business connections can connect to the newly expanded TiDB server. If you are not using HA, then your application should be directly connected to the previous TiDB server, and you need to direct some connections to the newly expanded TiDB server to distribute the load.
If there is no load balancing, then your business side is still connected to the original tidb-server. The newly expanded ones are essentially not being used, so the imbalance is expected.
Learned something new, it turns out to be a basic load balancing issue~ Overthought it.
I have already set up load balancing, but will a single SQL query ultimately only land on one server? Can’t two TiDB instances share the load of a single SQL query simultaneously?
A single SQL can only run in one TiDB, and then the operators are pushed down, with multiple TiKVs processing the data.
I have not yet encountered an SQL query that cannot be optimized.
The TiDB server cannot be distributed; each statement will have only one TiDB server owner to execute it. The TiDB server is stateless, and distributed computing is not performed at this layer.
Multiple TiDB instances cannot share the load of a single SQL query. If TiKV supports operator pushdown, multiple TiKV instances can execute it.
TiDB server does not have high availability and load balancing, so scaling up feels useless. The focus is on SQL issues.
No, for a specific SQL, it can only be executed on one TiDB server. Load balancing solves the problem of distributing the same type of SQL executed multiple times across different TiDB servers. For example, if you execute a SQL statement 1000 times, theoretically, two TiDB servers would each execute it 500 times.