HashAgg Causes High Memory Usage in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: HashAgg 导致TiDB内存占用过高

| username: yuqi1129

【TiDB Usage Environment】Production Environment or Test Environment or POC
Production
【TiDB Version】
v5.1.0
【Encountered Problem】
TiDB memory usage is too high, and it seems to be caused by HashAgg, as shown in the picture below:

PS: Checking the process found that this instance occupies 70% of the memory (the physical machine has a total of 128G memory). Is there any good solution for this situation?
The server-memory-quota configuration does not seem recommended for production use.
The tidb_mem_quota_query configuration is somewhat inappropriate; I need to limit memory but do not want query OOM errors.
【Reproduction Path】What operations were performed to encounter the problem

【Problem Phenomenon and Impact】

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: xfworld | Original post link

The only thing we can do is to find this SQL and optimize it reasonably; it seems there’s no other way.

| username: Raymond | Original post link

If you can accept the SQL execution being a bit slower, you can switch to stream aggregation.

| username: ddhe9527 | Original post link

Please post the SQL and execution plan for review.

| username: yuqi1129 | Original post link

agg cannot use hint binding in CTE

SELECT
stats_date
FROM
(
WITH fusion_order AS (
SELECT
/*+ STREAM_AGG() /
stats_date,
union_id,
union_name,
zone_id,
zone_name,
org_id,
org_name,
(
CASE WHEN site_id = 1 THEN ‘one’ WHEN site_id = 5 THEN ‘two’ END
) AS org_type,
(
CASE WHEN shipment_type = 8 THEN ‘on-1’ WHEN shipment_type = 10 THEN ‘on-2’ WHEN (
(
shipment_type IN(7, 9)
AND (
order_from = ‘Shop_Lite’
OR order_type = ‘attr’
)
)
OR shipment_type = 1
) THEN ‘x1’ WHEN (
order_from != ‘Shop_Lite’
AND shipment_type = 7
AND order_type = ‘natural’
) THEN ‘x1’ WHEN (
order_from != ‘Shop_Lite’
AND shipment_type = 9
AND order_type = ‘natural’
) THEN ‘x3’ END
) AS shipment_type,
SUM(valid_ord_cnt) AS ord_cnt
FROM
nrdc_bi.ads_o2ofusion_ord_org_d
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8,
9
),
fusion_order_org AS (
SELECT
/
+ STREAM_AGG() /
stats_date,
union_id,
union_name,
zone_id,
zone_name,
org_id,
org_name,
site_id,
SUM(valid_ord_cnt) AS ord_cnt
FROM
nrdc_bi.ads_o2ofusion_ord_org_d
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8
),
retail_order AS (
SELECT
/
+ STREAM_AGG() */
t1.stats_date,
t1.union_id,
t1.union_name,
t1.zone_id,
t1.zone_name,
t1.org_id,
t1.org_name,
(
CASE WHEN t1.site_id = 1 THEN ‘x1’ WHEN t1.site_id = 5 THEN ‘x1’ END
) AS org_type,
‘x’ AS shipment_type,
SUM(t2.ord_cnt - t1.ord_cnt) AS ord_cnt
FROM
fusion_order_org t1
INNER JOIN nrdc_bi.ads_retail_ord_store_d t2 ON t1.stats_date = t2.stats_date
AND t1.org_id = t2.org_id
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8,
9
)
SELECT
*
FROM
fusion_order
UNION ALL
SELECT
*
FROM
retail_order
) sql_model_virtual_table_new_27082_12067
where
stats_date like ‘%zghe%’
limit
1;

Executing this plan individually indeed uses stream agg, but if you use SPM for binding, it will not take effect.

| username: yuqi1129 | Original post link

The main issue is the use of CTE, and SQL binding is not effective.

| username: ddhe9527 | Original post link

What is the health of the statistics for the tables nrdc_bi.ads_o2ofusion_ord_org_d and nrdc_bi.ads_retail_ord_store_d? From the execution plan, the estRows returned to the TiDB side is only about 10,000 rows, so it is unlikely to occupy such a large amount of memory. Additionally, is the (stats_date, org_id) combination unique on at least one side in either fusion_order_org or nrdc_bi.ads_retail_ord_store_d?

| username: yuqi1129 | Original post link

This is the actual situation. It doesn’t seem large at the hundred-thousand level, but this SQL is called quite frequently, currently around 10~20 times per second.

The statistics health is good.

Additionally, is the (stats_date, org_id) combination unique in either fusion_order_org or nrdc_bi.ads_retail_ord_store_d?
Yes, this is the primary key of nrdc_bi.ads_retail_ord_store_d.

| username: ddhe9527 | Original post link

It might still be related to the concurrency. If binding under CTE cannot be used (PS: I haven’t verified this, but it should be usable), you can try the following methods:

  1. Use tidb_executor_concurrency to limit the concurrency of HashAgg.
  2. Scale out the number of TiDB instances.
  3. Use token-limit to limit the concurrency of TiDB instances.
  4. Have the development team modify the SQL and release a new version, adding a hint to change it to STREAM_AGG.

Additionally, it might be necessary for TiDB experts to check if there are other underlying reasons.

| username: yuqi1129 | Original post link

Okay, thank you :+1:

| username: ddhe9527 | Original post link

You’re welcome. Actually, if you drag your screenshot to the right, you can also see the specific memory usage in the memory field.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.