Advantages and Disadvantages of Databases with Compute and Storage Separation

translator_bot · June 22, 2024, 11:04am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 计算和存储分离的数据库的优缺点

| username: Raymond

I would like to ask, what are the advantages and disadvantages of databases with separated compute and storage versus databases with integrated compute and storage?

translator_bot · June 22, 2024, 11:04am

| username: hey-hoho | Original post link

Let me start the discussion. The two architectures you mentioned happen to have corresponding products. Typically, TiDB represents the separated architecture, while OB (and others like CockroachDB and YugabyteDB, where computation and storage are in a single process) represents the non-separated architecture.

Benefits of separation: Computation or storage can be scaled horizontally independently, offering more flexibility.
Drawbacks of separation: Higher network communication costs (data needs to be transferred to the computation layer for processing), leading to higher latency.

Others are welcome to add more insights~

translator_bot · June 22, 2024, 11:04am

| username: TiDBer_jYQINSnf | Original post link

Nonsense:

The more dispersed the division, the more meticulous the management, and the higher the corresponding additional costs.

In the case of small specifications, dividing into three parts is definitely a losing proposition.

In the case of large clusters, the finer the division, the higher the utilization rate of resources such as CPU and memory.

translator_bot · June 22, 2024, 11:04am

| username: 魔礼养羊 | Original post link

For reference only.
Typical read-write separation databases include TiDB; typical non-read-write separation databases include OceanBase.
In terms of architecture:

OB consists of two roles: proxy server (scheduling, load balancing) and observer (computation and storage). The observer is composed of several regions. If the scale is large, several zones are formed by observers. From an operational perspective, it is very convenient.
TiDB consists of PD (scheduling server), TiDB server (computation), and TiKV (storage). If AP is needed, TiFlash needs to be added, and CDC needs to be added for data collection. With TiUP, deployment is also not difficult.

From an architectural perspective, for pure read and write operations, OB should be stronger than TiDB; however, if it involves computation, especially the kind where computation and storage are separated, I estimate that TiDB is definitely stronger than OB.

From actual experience, TiDB’s ease of use is stronger than OB.

From my personal suggestion, if the database server has fewer than 10 nodes, TiDB will be far stronger than OB; for 10-100 nodes, I am not sure which is stronger. For more than 100 nodes, it depends on the DBA’s ability, and they will have the final say.

translator_bot · June 22, 2024, 11:04am

| username: tidb菜鸟一只 | Original post link

The advantages of a database with separated computing and storage are:

Infinite scalability: Computing and storage can be independently expanded and upgraded, allowing dynamic adjustment of system capacity based on demand.
Improved performance: By separating computing and storage, optimization can be performed according to needs, enhancing query efficiency and overall system performance.
Increased availability: Since computing and storage are separated, single points of failure can be avoided, and system availability and fault tolerance can be improved.
Easier maintenance and management: After separating computing and storage, they can be managed independently, making the overall system maintenance more convenient.

The disadvantages of separating computing and storage are:

Complex system architecture: After separating computing and storage, multiple components need to be linked and coordinated, making the system structure more complex.
Higher deployment costs: In the case of separated computing and storage, more hardware devices need to be deployed, resulting in higher costs.
Resource consumption for data transmission: After separating computing and storage, data needs to be transferred from the storage system to the computing system, which consumes resources and reduces system efficiency.

translator_bot · June 22, 2024, 11:04am

| username: xingzhenxiang | Original post link

Database with Compute-Storage Separation:

Advantages:

The database’s compute and storage are handled by different hardware systems, allowing for better balance of system resources.
By separating compute and storage, each part can be easily scaled and upgraded, making the system more flexible and reliable.
Costs can be reduced as storage and compute can be on different hardware, allowing for hardware selection based on specific needs.

Disadvantages:

Separating compute and storage requires additional software and hardware, which increases costs.
Separation may increase latency due to the need to transfer data over the network.
System management complexity may increase as it involves handling two independent hardware systems.

Database without Compute-Storage Separation:

Advantages:

Compute and storage are on the same machine, reducing latency and improving performance.
Easier to manage as it involves operating a single machine.
This architecture is typically used for smaller systems, making it relatively easier to implement and maintain.

Disadvantages:

As data volume increases, more hardware resources may be needed, leading to higher costs.
This architecture is usually difficult to scale, making it unsuitable for large systems. Scaling may require a complete system redesign.
In case of hardware failure, all data may be lost since compute and storage are on the same machine.

In conclusion, the choice of architecture should be based on specific business needs, budget, and technical resources.

translator_bot · June 22, 2024, 11:04am

| username: liuis | Original post link

Disadvantages: Increased system maintenance, increased costs
Advantages: Good scalability

translator_bot · June 22, 2024, 11:04am

| username: TiDBer_pkQ5q1l0 | Original post link

Increase network overhead