Practical Experience with TiKV Disk Size and Number of Regions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv磁盘大小、region数实践

| username: xie123

[TiDB Usage Environment] Production Environment / Testing / PoC
Production Environment
[TiDB Version]
tidb v6.5.5 and v5.4.0
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issues: Issue Phenomenon and Impact]
Consultation:

  1. Are there any production practices using HDD disks to deploy clusters?
  2. What is the maximum size of a single TiKV using HDD disks? Disk usage rate and number of regions (30GB memory TiKV configuration)
  3. What is the performance difference between HDD and SSD?
  4. When using large HDDs, what is the recommended number of regions for a single TiKV, and the recommended region size?
    [Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
    [Attachments: Screenshots/Logs/Monitoring]
| username: hey-hoho | Original post link

Having worked with numerous TiDB clusters, here are my personal views:

  1. In production, I haven’t encountered any deployments of TiKV using HDDs. TiDB has lower requirements, but since PD and TiDB are often co-located, HDDs are rarely used. I once encountered a customer deploying on virtual machines with NAS disks, and PD and TiKV frequently had inexplicable issues, all pointing to insufficient disk performance. Eventually, all such clusters had their disks replaced.
  2. For your TiKV node with 30GB of memory, I estimate the CPU allocation to be around 8-12 cores at most. If using SSDs, even 1TB might be more than enough, but with HDDs, the capacity would need to be smaller (due to the bottleneck principle).
  3. The performance difference is significant. SSDs are generally categorized into three levels: SATA SSD (around 500MB/s), SAS SSD (around 800MB/s), and NVMe SSD (2000MB/s and above). Regular HDDs achieving 300MB/s is already impressive.
  4. Generally, larger disks require more CPU and memory, unless it’s a cold archive scenario with more writes and fewer reads. Based on your needs, HDDs are generally not recommended.

In summary, it’s not that TiDB can’t run on HDDs, but the risk is extremely high, leading to various odd issues. After extensive troubleshooting, you’ll likely end up replacing the disks anyway, so there’s no need to waste time on unnecessary experiments.

| username: xfworld | Original post link

He can accept lowering the operating frequency and reducing the data processing size to adapt to the reduced configuration as well. :laughing:

| username: 春风十里 | Original post link

In terms of single IO
Generally, a single IO on an SSD can be done in 0.1ms-1ms, while the time for a single IO on an HDD is 4-10ms.

| username: 像风一样的男子 | Original post link

The performance gap between SSD and HDD is like night and day. You can perform an IO test yourself.

IOPS test reference command

fio -group_reporting -thread -name=iops_test -rw=randwrite -direct=1 -size=8G -numjobs=8 -ioengine=psync -bs=4k -ramp_time=10 -randseed=0 -runtime=60 -time_based

Read/write bandwidth test reference commands

fio -group_reporting -thread -name=iops_test -rw=randwrite -direct=1 -size=8G -numjobs=8 -ioengine=psync -bs=1m -ramp_time=10 -randseed=0 -runtime=60 -time_based
fio -group_reporting -thread -name=iops_test -rw=randread -direct=1 -size=8G -numjobs=8 -ioengine=psync -bs=1m -ramp_time=10 -randseed=0 -runtime=60 -time_based

| username: zhanggame1 | Original post link

NVMe drives typically have microsecond-level latency, but issues arise when the latency reaches milliseconds.

| username: xie123 | Original post link

There is definitely a big difference between IO HDD and SSD. I have also tested the IO of single disk SSD and HDD. The read and write IO of a single disk theoretically meets the requirements. If there are no hotspot issues and there are many TiKV instances, the read and write seem to be only a few dozen MB. So I want to see if there has been any production practice of deploying TiKV on HDD.

| username: 像风一样的男子 | Original post link

Those who deployed with HDD all regretted it.
Users of TiDB generally have large amounts of data, and HDDs can’t keep up in terms of performance. Slow queries are mostly inserts, so they all eventually switched to SSDs.

| username: xie123 | Original post link

Is there an approximate write volume and total storage level?

| username: 春风十里 | Original post link

NVMe is faster than regular SSDs. Previously, the price was relatively high, but now it seems to be about the same. However, the price of SSDs and regular mechanical drives is now almost the same, so if you are making a new purchase, SSDs are still better.

| username: 江湖故人 | Original post link

I haven’t encountered a database using HDDs in the past 5 years.

| username: TiDBer_jYQINSnf | Original post link

The clusters I’ve seen generally have fewer than 20k regions per single TiKV, and I haven’t used HDDs. If storing cold data, HDDs can also be considered.

| username: Kongdom | Original post link

  1. I mostly have HDD disks here.
  2. According to the official documentation, it is also recommended not to exceed this size.
  3. It depends on business needs; for ordinary business, the difference doesn’t seem significant. However, you must perform fio disk speed tests to have a clear understanding.
  4. The region should be fine with the default size, right?
| username: dba远航 | Original post link

The performance gap between HDD and SSD is 10-20 times. The size of the disk is related to the amount of data, usually 2-3 times the data volume. REGIONS should be adjusted according to the business situation and type. If the business leans towards large tables and large transactions, the REGION size can be appropriately increased.

| username: zhang_2023 | Original post link

The size of the region is determined by the amount of data.

| username: TiDBer_lBAxWjWQ | Original post link

Most people use SSDs now, right?

| username: tidb菜鸟一只 | Original post link

What are the approximate IOPS and read/write bandwidth of the SSDs you produce? Could you provide some test results so I can compare them with our current environment?

| username: 像风一样的男子 | Original post link

I am using Alibaba Cloud’s local SSD, and I have previously compared it with Alibaba Cloud’s ESSD cloud disk. There are two testing methods, one provided by Alibaba Cloud and the other by TiDB. The method provided by Alibaba Cloud is somewhat idealized.

| username: zhanggame1 | Original post link

280k IOPS already exceeds the speed of a typical single NVMe SSD, right?

| username: 双开门变频冰箱 | Original post link

IOPS is not a level.