TiKV Oldest Snapshot Duration Retained for N Days Causes Disk Full

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv oldest snapshot duration 保留N天导致磁盘打满

| username: TiDBer_yyy

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.4, 5.0.5
[Reproduction Path] Discovered monitoring alert, insufficient disk space.

[Encountered Problem: Symptoms and Impact]
Original post: 5.0.4 版本 某个节点tikv oldest snapshot duration 保留N天导致磁盘打满 - TiDB 的问答社区
Community suggestion: Manual compaction is possible, but the disk space on the node is already over 90%, and compaction will consume a large amount of disk space, potentially causing the disk to be overwhelmed.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: 像风一样的男子 | Original post link

I saw your last post, did you turn off the GC?

| username: Fly-bird | Original post link

How large is the disk?

| username: tidb菜鸟一只 | Original post link

Why turn off GC? If you turn it off, historical data won’t be cleaned up, so of course, disk usage will increase. If there’s no GC, there’s no need to compact either. Compacting is just to reclaim the space freed by GC. If GC is stopped, what’s the point of compacting…

| username: TiDBer_yyy | Original post link

Using Dumpling to back up data and set up a replica cluster.

| username: 像风一样的男子 | Original post link

What is the current status of the cluster?

| username: TiDBer_yyy | Original post link

Yes, it has been closed. GC was enabled all day the day before yesterday, but the historical snapshots were not cleared.

| username: dba-kit | Original post link

Version 5.0 does not support PITR incremental backups yet. If you want to perform cluster migration, it is best to split the migration into batches and set up multiple TiCDCs for synchronization. Additionally, if the downstream is a TiDB cluster, a better choice is the BR tool, as its physical backup speed is very fast. Dumping is too slow and has a significant impact on the cluster.

| username: TiDBer_yyy | Original post link

Available, disk continuous alarm. The disk space of multiple machines has reached 90%. The number of TiKV nodes reaching 90% is increasing.

| username: 像风一样的男子 | Original post link

If you want to set up a replica and the data cannot be consistent due to GC issues, it is recommended to extend the GC time rather than disable it.

| username: TiDBer_yyy | Original post link

BR, backup failed. Additionally, due to various factors, S3 storage cannot be used for data.
Failed post: Br备份S3失败[BR:KV:ErrKVStorage]tikv storage occur I/O error - #27,来自 Billmay表妹 - TiDB 的问答社区

| username: TiDBer_yyy | Original post link

Okay, I’ll give it a try.

| username: 像风一样的男子 | Original post link

If your S3 backup is consistently unsuccessful, try switching to backup to an NFS shared disk.

| username: TiDBer_yyy | Original post link

The company does not have an NFS shared disk yet. I will look into it later.

I am not sure if enabling it will solve the problem, as other clusters with GC enabled also encounter similar issues.

| username: zhanggame1 | Original post link

How many days has the GC been turned off?

| username: TiDBer_yyy | Original post link

It was open for 1 day about 1.5 weeks ago, then closed until this morning.

| username: 像风一样的男子 | Original post link

You can just find a server with a large enough disk and install NFS yourself.

| username: TiDBer_yyy | Original post link

Thank you, expert.
I’ll look for a tutorial.
Additionally, the br backup is reporting an error. I’m not sure if it’s a cluster issue or something else.

| username: TiDBer_yyy | Original post link

Boss, returning to the topic of the post, I don’t know if enabling GC can solve the problem. It is currently enabled. gc-life-time=480h, but the snapshot duration has not decreased.

| username: 像风一样的男子 | Original post link

You have set the GC time to 20 days, but I see that your longest snapshot time is 10 days. It will take 10 days before it starts cleaning up the fragments.