After cleaning log files in the Prometheus node docdb and restarting the node, the disk space was not released. Previously, cleaning and restarting the node would release the space, but it doesn't work now. Why?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: prometheus节点 docdb中日志文件清理并重启节点后,发现磁盘空间并没有释放,之前清理并重启节点后会释放,但现在不行了,不知道为啥?

| username: xiaoxiaozuofang

[TiDB Usage Environment] Production Environment
[TiDB Version] tidb v6.1.0
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: After cleaning the log files in the prometheus node docdb and restarting the node, it was found that the disk space was not released. Previously, cleaning and restarting the node would release the space, but now it doesn’t. Not sure why?]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

  1. Check if the dashboard has enabled Top SQL and continuous analysis features, and disable them if necessary.
  2. You can manually delete the file information in docdb to directly free up space.
  3. If you are not confident in handling it manually, you can scale down Prometheus and then scale it up again.

Main references:
image

| username: redgame | Original post link

Long-running transactions may cause data to not be reclaimed in a timely manner.

| username: xiaoxiaozuofang | Original post link

Manually deleting files under docdb and restarting the Prometheus node did not free up disk space. What could be the issue? The Top SQL feature is also not enabled.

| username: xiaoxiaozuofang | Original post link

The command to restart the Prometheus node: tiup cluster restart <cluster-name> -N 172.17.4.249:9090. After restarting the node in this way, the disk space is not released.

| username: xfworld | Original post link

Once the files are deleted, the space is immediately freed.

The release of space is not related to restarting Prometheus…

| username: xiaoxiaozuofang | Original post link

After deleting files, I need to restart the Prometheus node each time to release disk space. It used to work this way, but recently even restarting doesn’t help. I don’t know what the reason is.

| username: xfworld | Original post link

How much space did you allocate… If it doesn’t work, try expanding it…

| username: xiaoxiaozuofang | Original post link

Setting logs to be retained for 3 days has already used up nearly 400GB.

| username: TiDB_C罗 | Original post link

I also encountered the issue where docdb files could not be immediately reclaimed after deletion. A systemctl restart of prometheus-9000 was required to release them.

| username: xfworld | Original post link

It seems that the process is occupying the file handle, making it impossible to release it effectively.

If that doesn’t work, you can try: stopping the service first, then deleting the file, and then starting it again.

For example:
systemctl stop prometheus
delete file data…
systemctl start prometheus

| username: songxuecheng | Original post link

tiup cluster reload xxr -R prometheus
tiup cluster reload xx -R grafana

| username: xiaoxiaozuofang | Original post link

The issue has been resolved. Restarting or reloading Prometheus doesn’t work well; directly killing the monitoring node process can free up disk space.

| username: Kongdom | Original post link

It looks like the process is occupying it. :joy::joy::joy: