Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 为什么部署的三台主机的内存分布不均匀?
[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4.3
In our production deployment cluster, the memory usage seems to be a bit problematic. One machine is using very little memory. What could be the issue or how can it be resolved?
Why is your CPU fluctuating like a roller coaster? Did TiDB perform load balancing?
In a three-node mixed deployment, the node where the PD Leader is located should relatively use more resources.
You can see that your server 38 has recently restarted, and the cache of TiKV has been cleared, so its memory usage is smaller than the other two TiKVs. After running for a while, it will basically level out.
The block cache in TiKV, once used, will not be returned to the operating system unless it exceeds the limit. Other experts have also noticed this. After 38 restarts, combined with potential load imbalances, such as uneven connection numbers on TiDB nodes, the PD itself has a high leader load.
The roller coaster CPU is a TiFlash node
TiDB and PD can be placed together.
TiKV should be deployed separately, one node per machine, without mixing.
Check the resource status of the host list~
Mixed deployment of resources cannot be uniform, leading to many issues.
When the CPU spikes, check which process is consuming the most resources on the machine. It looks like some large queries.
I suspect that your business load has been concentrated on a particular TiDB, causing the memory usage on that specific machine to be particularly high.
Go to the topsql page and check if the executed SQLs are similar? I feel there should be significant differences, as the business might not be evenly distributed. From the load balancing software, are the connection numbers balanced?
In a mixed deployment, TiKV memory will gradually increase until it reaches the limit. TiDB may experience a sudden significant increase due to large queries.
It is probably caused by mixed deployment. Mixed deployment is not recommended.
If deployed in a hybrid manner, have you implemented resource control for TiDB and TiKV?
Use the top command on each node and then press shift+m to check the memory usage order. Compare the three nodes, and you should be able to identify which functional component is causing the memory difference.
It should be caused by the uneven distribution of REGION leaders.