The issue of large disk space usage in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb硬盘空间占用很大问题

| username: zhanggame1

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 7.5
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact]
The business data in TiDB is just over 30GB. During database testing, automated tests are run, involving a large number of partition table additions and deletions. Currently, the database data files (SST) occupy a significant amount of space.

Through analysis using information_schema.TIKV_REGION_STATUS, it was found that the largest data occupiers are stats_history and tidb_ddl_history. Is there any way to clean them up?

| username: 小龙虾爱大龙虾 | Original post link

By default, it is automatically cleaned. Early versions had a bug where it couldn’t be cleaned. Check if there is a time field in this table.

| username: tidb狂热爱好者 | Original post link

This is already version 7.5.

| username: 路在何chu | Original post link

You delete it manually.

| username: wangccsy | Original post link

Reduce capacity and delete non-essential data.

| username: zhanggame1 | Original post link

How to delete, delete from mysql.stats_history or truncate table mysql.stats_history

| username: zhanggame1 | Original post link

Using create_time to check, all are within 7 days, but the data volume is too large.

| username: porpoiselxj | Original post link

This large-scale deletion may cause OOM. It’s best to control the amount yourself and delete in batches.

| username: tidb狂热爱好者 | Original post link

truncate table mysql.stats_history

| username: zhanggame1 | Original post link

Deleting also doesn’t release space, so I ended up truncating it.

| username: TiDB_C罗 | Original post link

I don’t understand why TiDB isn’t designed like this:
After a delete, if the region where the data is located is empty, just release the space occupied by the region, or if the deleted data spans the entire region, just delete the region. A natural partition model would be so much better.

| username: zhanggame1 | Original post link

I think it might be that scanning all the data in the region if it’s empty is too costly. Truncate or drop doesn’t have this problem.

| username: TiDB_C罗 | Original post link

Truncate and drop in this situation are excessive. The data has already been partitioned into various regions, so further partitioning seems a bit redundant.

| username: tidb狂热爱好者 | Original post link

Why do you have to save space?

| username: zhanggame1 | Original post link

It is definitely unacceptable for 30GB of data to occupy more than 500GB of hard disk space.

| username: zhanggame1 | Original post link

The effect of testing truncate on mysql.stats_history in the backup database is very good.

| username: 像风一样的男子 | Original post link

If you don’t want this statistic, just turn off the system variable tidb_enable_historical_stats.

| username: zhanggame1 | Original post link

Is there a parameter to control the size of tidb_ddl_history, which is also very large?

| username: 像风一样的男子 | Original post link

Couldn’t find this.

| username: 小龙虾爱大龙虾 | Original post link

Aren’t you afraid the bug isn’t fixed? :rofl: