What is the impact of available TiKV storage capacity on performance? Experts, please join the discussion

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV存储可用容量对性能的影响有几何?大佬速来讨论

| username: jaybing926

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v7.1.5
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
Recently, we have been deleting historical data to free up space, and now the available capacity is about 25%. During the deletion process, we gradually noticed that the KV IO Util has been decreasing.
So, there’s a question: What is the maximum available storage capacity for TiKV? At what point does exceeding this capacity lead to performance degradation? Is there any related explanation or convention?
Not sure if this is an issue specific to TiDB or a general storage issue, let’s discuss it together~

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: tidb狂热爱好者 | Original post link

SSD performance should be measured by response time, not IO Util.

| username: jaybing926 | Original post link

Which one is the response time?

| username: jaybing926 | Original post link

The request frequency hasn’t changed much.

| username: zhaokede | Original post link

The impact of available storage capacity on TiKV performance is a multifaceted and complex issue.

| username: 小于同学 | Original post link

It’s a bit complicated.

| username: Kongdom | Original post link

:flushed: I don’t understand. More availability means faster speed? Less availability means slower speed?

| username: 有猫万事足 | Original post link

I only know that there will be an alert when it exceeds 80%. Also, PD tends not to schedule regions to TiKV nodes with more than 80% storage usage, unless all TiKV nodes exceed 80%.

So I feel that keeping it below 80% should be a better choice.

| username: 托马斯滑板鞋 | Original post link

What is the storage, is it NVME?

| username: jaybing926 | Original post link


| username: 托马斯滑板鞋 | Original post link

Previously, someone in the group mentioned that enterprise-level NVMe performance declines when reaching 80% capacity usage, and the performance drop becomes more noticeable as it approaches 100% usage (you can ask the manufacturer for specifics).

| username: forever | Original post link

I remember SSDs have this performance degradation.

| username: Kongdom | Original post link

:flushed: Deployed so many clusters, but haven’t used SSDs yet~ I didn’t know there was such a feature.

| username: ziptoam | Original post link

Performance degradation in SSDs when nearing full capacity is usually more noticeable than in HDDs. It is recommended to keep it below 70%.

| username: TiDBer_rvITcue9 | Original post link

How much remaining memory space will affect performance?

| username: forever | Original post link

The reason for the performance degradation of SSDs (Solid State Drives) when storage is nearly full is mainly related to the working mechanism of the flash memory chips used internally. Flash memory chips need to be erased before writing data. As the available capacity decreases, the number of clean pages (i.e., pages that can be written without erasing) also decreases, while the proportion of dirty pages that need to be erased increases, affecting write efficiency.

Additionally, modern SSDs often use SLC emulation cache technology to improve write performance. However, this caching mechanism relies on sufficient available space. When the SSD capacity is nearly full, the caching effect diminishes, leading to performance degradation. Specifically, SSDs are configured to have a portion of storage running at high speed, with the rest running at a slower speed. The faster portion is the SSD’s cache, whose size depends on the remaining space on the SSD. The more data stored on the SSD, the smaller the SLC cache, and the slower the write speed may be.
Reference: :grinning:

| username: zhaokede | Original post link

Indeed, there is such a reason, this is a hardware characteristic of SSDs.

| username: TiDB_C罗 | Original post link

This is the first time I’ve heard that SSDs have this feature.

| username: Kongdom | Original post link

:joy: This feature is a bit low-end.

| username: 我是吉米哥 | Original post link

It is recommended that the capacity of a single TiKV node be 4TB.