Querying TIKV_REGION_STATUS and TIKV_REGION_PEERS Tables Causes PD OOM

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 查询TIKV_REGION_STATUS和TIKV_REGION_PEERS表导致PD OOM

| username: dba-kit

Background:
I wanted to investigate the data distribution of a certain table, so I executed the following SQL:

USE INFORMATION_SCHEMA;
SELECT
    p.STORE_ID, count(1)
FROM 
    TIKV_REGION_STATUS s JOIN TIKV_REGION_PEERS p ON s.REGION_ID = p.REGION_ID
WHERE
    s.DB_NAME = "db_name" AND
    s.TABLE_NAME = "tb_name"
GROUP BY p.STORE_ID;

As a result, the PD node experienced an OOM, and the memory curve is as follows: (The first two peaks are from querying two tables with limit 1, and the last one is from using JOIN)
image

I checked the official documentation, which mentions the following description. Does this mean that the access to these two tables is directly converted to querying the PD API by the tidb-server?

| username: tidb菜鸟一只 | Original post link

How much memory does your PD have? How many tables are there? Querying a system table can exhaust the memory.

| username: h5n1 | Original post link

How large is the data-dir of PD now?

| username: dba-kit | Original post link

The machine has 16GB of memory, and the current cluster has more than 1.3 million regions. It seems that the PD memory is not being released in a timely manner, causing the issue.

| username: dba-kit | Original post link

Does this have anything to do with disk space?

| username: RenlySir | Original post link

If you remove the group by, will it still OOM?

| username: zhaokede | Original post link

Have you tried executing this SQL each time and encountering an OOM (Out of Memory) error?

| username: h5n1 | Original post link

It has nothing to do with disk space, but it is related to the amount of data. I want to check the current PD data volume.

| username: TiDBer_jYQINSnf | Original post link

Are PD and other nodes mixed? 1.3 million regions are indeed quite a lot. Can you also crash it by using pd-ctl region?

| username: DBAER | Original post link

It feels a bit strange, try using pd-ctl to check directly.