Explanation of Memory

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于内存的说明

| username: vincentLi

There is a paragraph in the technical documentation that I can’t understand
RocksDB Overview | PingCAP Documentation Center

RocksDB Memory Usage

To improve read performance and reduce disk reads, RocksDB splits files stored on disk into blocks of a certain size (default is 64KB). When reading a block, it first checks the BlockCache in memory to see if the block data exists. If it does, it can be read directly from memory without accessing the disk.

BlockCache eliminates low-frequency access data according to the LRU algorithm. By default, TiKV uses 45% of the total system memory for BlockCache. Users can also modify the storage.block-cache.capacity configuration to set an appropriate value, but it is not recommended to exceed 60% of the total system memory.

Data written to RocksDB is written to the MemTable. When the size of a MemTable exceeds 128MB, it switches to a new MemTable for writing. TiKV has a total of 2 RocksDB instances, totaling 4 ColumnFamilies. The size limit for a single MemTable in each ColumnFamily is 128MB, and a maximum of 5 MemTables are allowed. Otherwise, it will block foreground writes. Therefore, the maximum memory occupied by this part is 4 x 5 x 128MB = 2.5GB. This part of the memory usage is relatively small, and it is not recommended for users to change it.

===================================

  1. A TiKV has two RocksDB instances, one is raftdb and the other is kvdb.
  2. kvdb has 4 ColumnFamilies.
  3. Shouldn’t MemTable have 2 per RocksDB instance (one in use and one for flushing to disk)?
    So shouldn’t the memory of a single kv be 2*2*128m? What does it have to do with ColumnFamily?
| username: Kamner | Original post link

ColumnFamily = MemTable + SST + Shared WAL

The size limit for a single MemTable in each ColumnFamily is 128MB, with a maximum of 5 MemTables allowed.

This means there can be up to 5 MemTables for data read and write operations.

If, as you mentioned, one is being used and another is being flushed to disk, then if the one in use gets full and the flush hasn’t completed, it will cause a wait, and data cannot be written.

| username: TiDBer_jYQINSnf | Original post link

Each column family has its own set of memtables. For example, if max_write_buffer_number=5, then on one TiKV instance, it would be 4 (column families) * 5 (max_write_buffer_number) = 20 memtables. There is also a parameter for memtables, write_buffer_size=128MB, which means that once 128MB is written, it rotates to another one. After filling up 5 memtables, it flushes to disk.

Memtables rotate just like we write logs. Only one memtable is being written to at a time. Once it is full, it switches to another one. After filling up 5 memtables, it flushes to disk.

| username: zhanggame1 | Original post link

For this part, I suggest not only looking at the TiDB documentation but also checking out some RocksDB documentation. For example, this article is quite well-written:
深入RocksDB原理 - 知乎 (zhihu.com)

| username: wangkk2024 | Original post link

How about having an operating system engineer take a look?

| username: dba远航 | Original post link

ColumnFamily is an intermediate process of the transaction.

| username: QH琉璃 | Original post link

I don’t quite understand this.

| username: 洪七表哥 | Original post link

This is reliable.

| username: zhaokede | Original post link

You can also take a look at the source code.

| username: wangkk2024 | Original post link

Here to learn a bit.

| username: YuchongXU | Original post link

It doesn’t matter.