Why does inserting data into TiDB consume a lot of memory?

translator_bot · June 21, 2024, 11:22am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB插数据很占内存为什么？

| username: TiDBer_JlY1JCJ5

I deployed a monolithic simulation cluster and found that the memory usage increases rapidly when querying data continuously. I understand that the LSM-tree data interface inherently has space amplification. However, I noticed that only the C0 layer of the LSM-tree is in memory. When the data reaches a certain amount, the C0 layer merges into the lower Cn layers, which should primarily occupy disk space. Given this merging operation, why does the memory usage also keep increasing?

translator_bot · June 21, 2024, 11:22am

| username: zhanggame1 | Original post link

Are you referring to TiDB server or TiKV memory usage? It’s best to check with top and sort by memory using shift+m to see the SHR column and identify which process is using the memory. In my previous tests, TiKV memory usage was generally stable, while TiDB server memory usage often increased rapidly.

translator_bot · June 21, 2024, 11:22am

| username: TI表弟 | Original post link

A monolithic simulation as a whole is a false proposition.

translator_bot · June 21, 2024, 11:22am

| username: TI表弟 | Original post link

How do you determine that the memory is occupied by the LSM tree? Look at the memory usage of various components clearly. Distributed systems definitely occupy more memory than single-node systems.

translator_bot · June 21, 2024, 11:22am

| username: heiwandou | Original post link

First, determine what is occupying the memory.

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

The default value of tidb_enable_clustered_index is INT_ONLY, which means that the clustered index is enabled only for tables with integer primary keys. If you want to enable the clustered index for all tables, you need to set this parameter to ON.

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

The images have been uploaded. The tidb-server and 3 tikv are ranked in the top four in the previous post.

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

Screenshot has been uploaded, please check.

translator_bot · June 21, 2024, 11:22am

| username: Jellybean | Original post link

Single-node deployment: 1 TiDB + 3 TiKV + 1 iFlash + 1 MySQL + N other components.

It’s already good enough if it can run normally. For this kind of deployment, it’s recommended to only look at functional features. If you want to evaluate performance, this deployment architecture is the biggest issue, as the resource competition among all components is too intense. To evaluate performance, deploy them on separate nodes.

translator_bot · June 21, 2024, 11:22am

| username: zhanggame1 | Original post link

The resource usage for a single-machine deployment is at a normal level. The tidb-server is written in Go, and memory release involves garbage collection (GC), which takes some time.

translator_bot · June 21, 2024, 11:22am

| username: kelvin | Original post link

A single-machine simulated cluster doesn’t have much comparability, being able to run it normally is already impressive.

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

Sure, thank you. I’m just not clear about one thing. When inserting data, under the same conditions, TiDB obviously uses more memory than MySQL. I wonder if this memory usage is due to the space amplification caused by the LSM-tree?

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

One of the reasons is due to GC release, and it has little to do with LSM-tree space amplification when inserting data?

translator_bot · June 21, 2024, 11:22am

| username: 托马斯滑板鞋 | Original post link

For a single machine, it is recommended to use either playground or a single TiKV.

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

How can I reduce the memory usage of TiKV without redeploying?

translator_bot · June 21, 2024, 11:22am

| username: 托马斯滑板鞋 | Original post link

If you have the data, use DM to back it up and rebuild the cluster

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

How can I reduce the memory usage of TiKV? Currently, it remains high even without inserting data.

translator_bot · June 21, 2024, 11:22am

| username: Jellybean | Original post link

The high memory usage of TiKV is usually caused by the block-cache parameter being too large.

You can set this value smaller, especially in scenarios where multiple TiKVs are deployed together. This parameter is particularly important and needs to be adjusted to an appropriate memory value.

Specifically, you can search the forum for how to optimize this parameter.

translator_bot · June 21, 2024, 11:22am

| username: TiDBer_JlY1JCJ5 | Original post link

Can executing this command restart TiKV?

translator_bot · June 21, 2024, 11:22am

| username: Jellybean | Original post link

Yes, after adjusting the TiKV parameters of the cluster using the tiup cluster edit-config command, you can reload only the TiKV nodes in this way.

The reload operation will refresh the latest configuration and restart the corresponding node cluster.