Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIDB部署时候NUMA绑核对性能影响
【TiDB Usage Environment】Production Environment
【TiDB Version】V6.5.8
【Encountered Problem: Issue Phenomenon and Impact】The impact of NUMA binding on performance during TiDB deployment. I have searched many documents online, and some data indicators based on sysbench stress testing show that enabling NUMA binding can improve performance by 20%. Has anyone tested this in a production environment?
There will also be performance differences in production. The specific gap depends on various factors such as actual testing, business, and concurrency. Especially when the database encounters a slight bottleneck, the difference will become more noticeable.
NUMA binding is mainly more effective when configured at the TiKV layer, right? Does TiDBServer also need to be bound to specific cores? I’m also using it for the first time, so I’m not very familiar with it.
Is NUMA binding used during mixed deployment? Is it necessary to bind cores for normal deployment?
Yes, it is needed. Typically, tidb-server uses up to 24 cores of the CPU. If the machine has a lot of CPUs, it is still recommended to bind them. For example, if the machine has 128 cores and 512GB of RAM with 8 NUMA nodes, then a single node would have 16 cores and 64GB of RAM. In this case, if you are only deploying tidb-server on this machine, it is recommended to deploy 8 or 4 tidb-servers. If deploying 4, then bind each tidb-server to 2 NUMA nodes. If deploying 8, then bind each tidb-server to 1 NUMA node.
It depends on the machine. Generally, we will bind the cores.
Indeed, it makes sense. Although TiDB has the capability of compute pushdown, TiDBServer still handles some global sorting results. The performance will be better after binding the cores. I plan to test the environment next week. Thanks for the support.
Another point is that previously, it was generally UMA architecture, which is Uniform Memory Access. Now, this NUMA is Non-Uniform Memory Access. In this case, cross-NUMA access may reduce the efficiency of CPU and memory. Therefore, binding NUMA can allow the CPU to access its corresponding memory more closely, which can also improve efficiency. Additionally, binding NUMA can relatively better isolate resources.
If a machine only deploys a single node, the difference with or without NUMA binding is not significant. However, if a machine deploys multiple nodes, it is more appropriate to bind each node to a different NUMA.
You can try it in a test environment first. It’s more practical to conduct an experiment than just reading the documentation, and the conclusions you draw yourself will be more profound.
This is more noticeable on ARM architecture hosts, with a performance gap of about 5 times. For x86, the data warehouse product next door has a performance gap of about 40% based on tests. However, binding NUMA cores means reduced available memory, so this needs to be balanced.
I’m so surprised that there’s such a significant improvement on the ARM architecture.
Okay, planning to test next week.
The ARM architecture is particularly suitable for this kind of operation.
It is still necessary to compare with NUMA binding enabled and disabled.
It is generally not recommended to use NUMA with most databases.
Let’s see if there are any test reports to study.
Once a conclusion is reached, please update the group.
We are looking forward to it. We plan to deploy on 6 servers. We haven’t decided yet whether to use NUMA or how to deploy.