TiDB Result Set Processing Issues

translator_bot · June 23, 2024, 10:58am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB结果集处理问题

| username: TiDBer_7odj6O9Q

Consulting experts:

When TiDB performs a full table scan and needs to return 100GB of data, does the engine read and send this 100GB of data to the client as it reads, or does it load the 100GB of data into memory before sending it?
How does the client API handle the result set cursor? Does it process the data as it reads, cache all the data locally before processing, or support both methods?

Use case: ETL large-scale data synchronization processing.
Actual business: Data processing for a telecom operator’s economic analysis.

translator_bot · June 23, 2024, 10:58am

| username: 小王同学Plus | Original post link

In this data warehouse scenario, you can consider using TiFlash TiFlash 简介 | PingCAP 文档中心

translator_bot · June 23, 2024, 10:58am

| username: cs58_dba | Original post link

Generally, it’s not recommended to read such a large amount of data at once, otherwise, the memory would have already been overwhelmed.

translator_bot · June 23, 2024, 10:58am

| username: ddhe9527 | Original post link

The data returned by the engine to TiDB can only be considered as intermediate results of pushdown computation, not the final data returned to the client. This part of the data is stored in the memory of the TiDB instance. When the memory usage reaches mem-quota-query, it will be stored on the disk (oom-use-tmp-storage is true by default). If there is no space on the disk, it will be handled according to the oom-action setting. Therefore, if you need to return 100GB of data, these intermediate data must be temporarily stored in the memory and disk of the TiDB instance before being sent to the client.