How to Use NVME SSD Disks with TiDB?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB如何使用NVME SSD磁盘?

| username: jingyesi3401

[TiDB Usage Environment] Production Environment, Testing

[TiDB Version] v6.1.5

[Encountered Problem: Phenomenon and Impact] To improve the IOPS of the newly built TiDB cluster, we plan to use a direct disk mounting scheme. To ensure data safety, we plan to use RAID10 for the server disks. However, the hardware vendor mentioned that NVME SSD disks do not support RAID cards. After further research, we found that RAID can be achieved through the VROC solution, but the cost is relatively high (specific price is under inquiry). The hardware vendor recommended using software RAID, but we are concerned about potential pitfalls. We would like to consult with everyone to see if there are any similar deployment cases to share. Thank you.

| username: zhanggame1 | Original post link

TiDB does not use SSDs for RAID.
TiDB itself is highly available and does not lose data. By default, data is written in three replicas, so a single hard drive failure has no impact.

We consulted TiDB vendor engineers, and their advice was not to use RAID.

| username: caiyfc | Original post link

After setting up RAID for NVMe, the read and write speeds won’t see a significant improvement; at most, the data will be safer. However, TiDB is highly available, so if a disk fails, you can simply scale down that node, and the impact will be minimal.

| username: jingyesi3401 | Original post link

The main purpose of doing RAID is for safety. After all, with tens of terabytes of data, relying on replica modification is likely to be more time-consuming and costly than directly replacing the disk.

| username: caiyfc | Original post link

But the cost is too high, it depends on whether it can be accepted :thinking:
Of course, with RAID and TiDB’s inherent high availability, the data will definitely be more secure.

| username: zhanggame1 | Original post link

NVMe drives don’t have a RAID card, right? If you really need to use it, you can use software RAID without much cost.

| username: jingyesi3401 | Original post link

The vendor provided a software RAID testing solution, but this solution is not based on the system and is also hardware-based. It doesn’t seem to be cheap either. Currently, we are testing IOPS.

| username: MrSylar | Original post link

Most server RAID cards indeed do not support configuring RAID for SSDs. If you want to use hardware RAID, you should specifically request it when purchasing the server.

| username: zhanggame1 | Original post link

The system-based software is also very mature, you can consider it, and I suggest testing it.

| username: zhanggame1 | Original post link

All RAID cards support SATA SSDs, but NVMe support is meaningless as the speed can’t keep up.

| username: MrSylar | Original post link

This kind of high availability requirement generally comes from operations and maintenance; performance is not their primary goal.

| username: xfworld | Original post link

Prepare more hardware resources and expand nodes, which is also one of the high availability solutions.

RAID resources are one of the local high availability solutions for nodes, but you need to pay attention to cache issues and disk flushing intervals…

These are all optional solutions, the only considerations are maintenance costs and hardware costs…

| username: zhanggame1 | Original post link

If you don’t mind spending money on operations and maintenance, attaching storage devices like all-flash arrays offers higher speed and reliability.

| username: Jellybean | Original post link

There is no need to use RAID. We have deployed many sets of production environment clusters without using RAID. Relying on the three-replica mechanism is sufficient to ensure data high availability and security.

| username: redgame | Original post link

No need to do it, just go ahead.

| username: zhanggame1 | Original post link

Nowadays, mainstream Kubernetes hyper-converged high availability does not rely on RAID implementation; RAID cards are generally only used for system disks.

| username: 像风一样的男子 | Original post link

If budget is not an issue, definitely go for RAID. Most operations and maintenance professionals trust RAID more.

| username: zhanggame1 | Original post link

It’s basically impossible to buy an NVMe RAID card, and the speed isn’t sufficient. A RAID card uses one PCIe slot and connects a bunch of hard drives, while each NVMe requires a certain number of PCIe lanes.

| username: 裤衩儿飞上天 | Original post link

  1. If you have the money and don’t mind the cost, you can do it; the data will be more secure.
  2. Generally speaking, you don’t have to do it. With the default three replicas and proper backups, the probability of data loss is already quite low.
  3. When it comes to data security, no solution can guarantee 100% safety. You really have to weigh it yourself.