Rapid Disk Space Growth on TiDB Cluster Prometheus Node

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 集群 promtheus节点 磁盘空间增长快速

| username: TiDBer_yyy

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.4
[Reproduction Path] The cluster’s Prometheus machine’s disk has expanded from 200G to 1200G since March, and the disk space is growing particularly fast.
[Encountered Problem: Phenomenon and Impact]
The disk is growing faster and faster.
[Resource Configuration]
Startup script

exec bin/prometheus/prometheus \
    --config.file="/data/tidb-deploy/prometheus-9090/conf/prometheus.yml" \
    --web.listen-address=":9090" \
    --web.external-url="http://xxxxxx:9090/" \
    --web.enable-admin-api \
    --log.level="info" \
    --storage.tsdb.path="/data/tidb-data/prometheus-9090" \
    --storage.tsdb.retention="30d" \
    --storage.tsdb.max-block-duration=2h \
    --storage.tsdb.min-block-duration=2h

[Attachments: Screenshots/Logs/Monitoring]
PS:

  1. The cluster disk was expanded during this period, but the growth was not significant.
  2. It has been confirmed that the machine’s Prometheus TSDB occupies the most disk space.

    image

Question:
How to reduce Prometheus DB storage space?

| username: zhanggame1 | Original post link

Adjust the retention period of monitoring data. If the data is no longer needed, you can scale down and then scale up the monitoring components.

| username: cassblanca | Original post link

–storage.tsdb.retention Set it smaller or directly specify the storage size --storage.tsdb.retention.size: The size of the storage to be retained

| username: 昵称想不起来了 | Original post link

–storage.tsdb.retention set smaller? Can important time data that you want to retain be exported as a snapshot?

| username: 大飞哥online | Original post link

–storage.tsdb.retention=“30d” Saving for one month can be changed to 15d or 7d

| username: TiDBer_yyy | Original post link

The company requires retaining data for 30 days, which is not very adjustable.

| username: 大飞哥online | Original post link

The maximum number of bytes that can be stored in a block. Supported units: KB, MB, GB, TB, PB.

  --storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE

Or just a fixed size limit.

| username: TiDBer_yyy | Original post link

Got it, I’ll give it a try.

| username: 像风一样的男子 | Original post link

Your log size seems a bit abnormal. My 30-day logs are less than 100GB. Check if your Prometheus is experiencing any issues.

| username: tidb菜鸟一只 | Original post link

What is the size of your cluster that the Prometheus logs can be this large?

| username: TiDBer_yyy | Original post link

30TB, 28 TiKV

| username: DBRE | Original post link

Add the following content to the Prometheus configuration file under the job tikv section to remove the collection of some metrics. This can reduce storage, but each time the topology changes, this configuration will be overwritten, and you need to modify it again and restart Prometheus.

metric_relabel_configs:
  - source_labels: [__name__]
    separator: ;
    regex: tikv_thread_nonvoluntary_context_switches|tikv_thread_voluntary_context_switches|tikv_threads_io_bytes_total
    action: drop
  - source_labels: [__name__,name]
    separator: ;
    regex: tikv_thread_cpu_seconds_total;(tokio|rocksdb).+
    action: drop
| username: zhanggame1 | Original post link

The scale is quite large. How many servers are there?

| username: TiDBer_yyy | Original post link

The total amount is 40+; not too large, right? According to TiDB officials, it supports clusters of up to 300TB.

| username: TiDBer_yyy | Original post link

Got it, boss.

| username: zhanggame1 | Original post link

Support is support, but the hardware requirements are also high. You should add more hard drives.

| username: TiDBer_yyy | Original post link

Alright :rofl:

| username: tidb菜鸟一只 | Original post link

TiDB has many default monitoring items, and it definitely supports cluster scale. However, the monitoring log volume must be kept for 30 days, which will certainly be substantial.

| username: redgame | Original post link

Try stopping the unnecessary monitoring items.

| username: TiDBer_yyy | Original post link

Bro, I also feel that TiDB has too many default monitoring items. Which ones can be cleaned up?