[TiDB Usage Environment] Production Environment
[TiDB Version] 5.3.0
[Reproduction Path] Querying a large table (100 million+ rows) using the following streaming query
However, the database still encountered an Out Of Memory Quota![conn_id=94979] issue
mem-quota-query is set to 1G
I would like to ask the experts, is it because using streaming queries only reduces the number of records returned to the client at once by the database server, but on the database server side, all the data is still queried at once, and the queried data exceeds 1G, causing the issue?
For full table scans, just add an index, and use secondary index lookups.
It seems there isn’t a much better way.
Check the monitoring to see if this SQL is occupying memory.
In the SQL, two ORDER BY clauses are used, and a full table scan is performed, pulling all the data from the table to the TiDB server for sorting. Naturally, this can easily cause an OOM (Out of Memory) issue. First, check if the project_id can filter a lot of data and create an index.
In actual scenarios, the project_id basically does not filter out data. So I understand that because the data volume is too large, it is all sorted on the TiDB server, causing an OOM, right?
I checked the monitoring and this SQL does indeed consume a lot of memory. It’s because it’s a big data query, so we used this cursor approach, but I didn’t expect it to result in an OOM (Out of Memory) error.
I understand that streaming queries should be executed in parts, which should be different from regular queries that retrieve all results at once, right?
Try removing the order by.
My understanding is that the order by causes streaming to be unavailable. Without sorting all the data, it is not known which row should be returned first.
Another way is to increase the value of mem-quota-query. This configuration is mainly to ensure the performance of TP services. It is obvious that your query is not a TP service but an AP service, so you can appropriately increase this parameter.
I think so too. It is because of this sorting that all the data was loaded into the TiDB server, resulting in a single query exceeding 1GB. I tried adding indexes for patientid and visitid, and it worked.