Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 请问RocksDB支持ACID事务吗
I read a post (RocksDB与InnoDB数据库引擎比较_rocksdb和innodb-CSDN博客) that says RocksDB does not support transactions: RocksDB does not support ACID transactions. However, I see in the TiDB documentation that TiDB supports ACID transactions. Isn’t the storage engine of TiDB RocksDB? So, does RocksDB support transactions or not? I’m a bit confused. MySQL uses InnoDB. What are the advantages and disadvantages of Key-Value compared to InnoDB, and why didn’t TiDB choose InnoDB?
RocksDB itself does not directly support traditional ACID transactions. In TiDB, an additional transaction management layer is used to support ACID transactions. RocksDB was chosen as the storage engine for TiDB mainly because of its excellent performance and scalability, and it is well-suited for use with distributed architectures.
Is there any documentation explaining the implementation of ACID transaction support through an additional transaction management layer?
RocksDB instance:
Stores user data and MVCC information (commonly referred to as kvdb). There are four ColumnFamilies in kvdb: raft, lock, default, and write:
- Raft column: Used to store metadata for each Region. It occupies a very small amount of space, so users do not need to pay attention to it.
- Lock column: Used to store pessimistic locks for pessimistic transactions and the first phase Prewrite locks for distributed transactions. After a user’s transaction is committed, the corresponding data in the lock cf will be quickly deleted, so in most cases, the data in the lock cf is also minimal (less than 1GB). If the data in the lock cf increases significantly, it indicates that a large number of transactions are waiting to be committed, and there is a bug or failure in the system.
- Write column: Used to store the user’s actual write data and MVCC information (the start time and commit time of the transaction to which the data belongs). When a user writes a row of data, if the length of the row is less than 255 bytes, it will be stored in the write column; otherwise, the row will be stored in the default column. Since the value stored by TiDB’s non-unique index is empty and the value stored by the unique index is the primary key index, secondary indexes will only occupy space in the write cf.
- Default column: Used to store data longer than 255 bytes.
RocksDB does not support transactions, but TiKV does. TiKV is a highly layered architecture that builds a Raft layer, an MVCC layer, and a transaction layer on top of RocksDB.
RocksDB is only responsible for the data storage method in TiDB, while the entire TiDB cluster implements transactions.
Does not support ACID transactions
Because to implement a distributed system, you need to choose a storage structure suitable for distribution. Clearly, MySQL’s InnoDB is not suitable for distributed storage because the data is stored on a single machine and cannot be split. However, the LSM tree architecture is a key-value model, and each piece of data can be split, making it much easier. Distributed systems require easy splitting, that’s my understanding.
MySQL’s InnoDB can also do this, such as Tencent’s TDSQL, which automatically partitions databases and tables.
TDSQL has not been tested, but I understand that it uses InnoDB as a distributed storage database. Does it come with a built-in sharding middleware?
There is a front-end gateway for database sharding.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.