What are the advantages of TiDB using key-value storage compared to row storage used by Oracle, MySQL, etc.?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB使用key-values的方式存储和oracle mysql等使用行存储的优势是什么?

| username: 春风十里

I understand that TiDB stores data in TiKV in the form of key-value pairs, while Oracle and MySQL store data in data files in the form of rows. TiDB’s key-value storage method feels similar to Oracle’s index-organized tables. My question is why choose this storage method, and what are its advantages and disadvantages?

| username: 考试没答案 | Original post link

Advantages:

  1. No full table lock, all DDL operations can be performed online asynchronously.
  2. Sequential writing is particularly fast.

I found that OceanBase also uses this method.

| username: 考试没答案 | Original post link

The drawbacks I currently see are: merging, merging. This merging process takes a bit too long and consumes quite a lot of resources.

| username: xfworld | Original post link

You can understand the differences between LSM Tree and B+ Tree in terms of storage structure and applicable scenarios.

As for why TiDB, Oracle, and MySQL choose different implementations, it is determined by the applicable scenarios set by the product.

Reference documents:

| username: 裤衩儿飞上天 | Original post link

Understand the use cases of LSM and B+ tree.

| username: tidb菜鸟一只 | Original post link

Isn’t this the difference between LSM trees and B-trees in storage?

| username: Raymond | Original post link

I would like to ask, why does KV storage result in no full table lock?

| username: 春风十里 | Original post link

Thanks for sharing, I’ll take a look and study it.

| username: Running | Original post link

Distributed, scalable

| username: 春风十里 | Original post link

On the other hand, using KV for data abstraction in the underlying storage is a more flexible choice.

One benefit is simplicity. For scale-out requirements, sharding KV key-value pairs is much easier than sharding structured data with complex table structures. Additionally, abstracting the storage layer can also bring new options for computation, such as integrating with other computing engines and using them in parallel with the TiDB SQL layer. TiSpark is a great example of this.

From a development perspective, this separation also provides flexibility in choosing different programming languages for development. For the stateless computation layer, we chose Go, a language with extremely high development efficiency. For the storage layer project TiKV, which is closer to the system’s underlying layer and more performance-sensitive, we chose Rust. If all components were coupled together, it would be difficult to achieve such on-demand multi-language development. For the development team, it also allows professionals to do what they do best. Storage engine developers and SQL optimizer developers can work in parallel. Additionally, for distributed systems, almost all communication is done via RPC, so a more explicit layering is a natural and low-cost choice.

| username: 春风十里 | Original post link

I don’t quite understand this. It doesn’t seem to have much to do with a full table lock, does it?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.