The TiKV disk fills up quickly, what is the specific reason, can any experts help analyze it?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tikv磁盘很快就占满了,具体是什么原因,有没有大佬帮忙分析下?

| username: TiDBer_ZHcgATCp

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] V7.1.0
[Reproduction Path] Normally, Lightning exports data without issues, but disk usage has increased from 35% to 78%, even though business data has not increased.
[Encountered Problem: Phenomenon and Impact] Disk is full, not sure if space is not being released, as tables are frequently renamed.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]


Are these empty regions not corresponding to tables, and is it historical data that hasn’t been deleted and is occupying space?

| username: zhanggame1 | Original post link

First, check if the GC is progressing normally.

| username: TiDBer_ZHcgATCp | Original post link

It looks like it’s progressing normally.

| username: Billmay表妹 | Original post link

Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page.
Let’s take a look at your configuration~

| username: TiDBer_ZHcgATCp | Original post link

Cousin, this one.

| username: terry0219 | Original post link

The time in your GC screenshot looks a bit strange.

| username: TiDBer_ZHcgATCp | Original post link

This is the server time, showing an extra hour.

| username: terry0219 | Original post link

How about checking this parameter: show global variables like ‘tidb_gc_life_time’;

| username: TiDBer_ZHcgATCp | Original post link

SHOW GLOBAL VARIABLES LIKE 'tidb_gc_life_time';
| username: 小龙虾爱大龙虾 | Original post link

You mentioned that the disk is full. First, check the host to see what is occupying the space.

| username: TiDBer_ZHcgATCp | Original post link

I roughly estimated that the table size plus replicas account for about 35%. I don’t know what the rest is, but I see many regions without corresponding table names. I wonder if these are taking up the space.

| username: Jellybean | Original post link

In the PD panel, you can view the total cluster space, available space, used space, and total space for each node. You can also go to the tikv-details → Cluster panel to check the disk usage of each node. First, check the space usage status of the cluster.

| username: 江湖故人 | Original post link

If the region-split-size hasn’t been changed, even without GC, a region is only 96MB. Count how many regions there are without table names.

| username: 春风十里 | Original post link

In the TIKV_REGION_STATUS table, the data where TABLE_ID, DB_NAME, and TABLE_NAME are NULL corresponds to some system tables or system Regions in TiKV. These Regions do not belong to any user table, so the corresponding TABLE_ID, DB_NAME, and TABLE_NAME information cannot be found in the TIKV_REGION_STATUS table.

| username: 春风十里 | Original post link

Take a look at the Abnormal stores and Region Health in Grafana → PD to see if there are any abnormalities with the stores and if there are indeed many empty regions. I saw someone mention before that it might be a GC issue.

| username: Kongdom | Original post link

Check which files are taking up a lot of space. Are they log files or data files? Are they TiDB cluster files or other files? Hopefully, no one has uploaded a large file to the server.

| username: zhanggame1 | Original post link

Log in to the server and use ‘du’ to check which files are occupying the space. Are they SST files?

| username: 路在何chu | Original post link

First, check which files are taking up space.

| username: andone | Original post link

Check with du on the Linux operating system. Also, check the GC recovery status and the region merge status.

| username: Inkjade | Original post link

First, check which file is occupying a large amount of disk space:

  1. Go to the PD panel to view the total space of the cluster, the available space of each node, the used space, and the total space situation. You can also go to the tikv-details → Cluster panel to check the disk usage of each node.
  2. Use df -h to see which disk is heavily occupied.
  3. In specific directories, check specific files:
    du -h --max-depth=0
    Check if they are log files.