Background:
3 physical machines, 80 CPUs, 2 NUMA nodes. Each physical machine deploys 2 TiDB servers, 1 PD, and 1 TiKV. The 2 TiDB servers are bound to the 2 NUMA nodes respectively. After conducting tests with sysbench, it was found that binding NUMA resulted in a 19% improvement in QPS and TPS performance under a 10-thread concurrent test. However, as the number of concurrent threads increased, the improvement in QPS and TPS became less significant. When testing with 500 concurrent threads, QPS and TPS even decreased by 1% compared to when NUMA was not bound.
Is this test result reasonable?
How much performance improvement did you observe after binding NUMA?
You only bound the TiDB server, but not PD and TiKV? They should all be bound. Also, you need to check the CPU utilization after 500 concurrent requests.
You mean that each physical machine is mixed with 2 TiDB, 1 PD, and 1 TiKV, and then the 2 NUMA nodes are allocated to TiDB respectively.
When there are 500 concurrent requests, the CPU utilization is only 60%. Is it possible that PD and TiKV have reached their bottleneck? Check their monitoring to see.
To avoid components using CPUs across NUMA nodes. Do you have a load balancer like HAProxy in front of the 6 TiDB servers? You can start by testing with 3 TiDB servers, with PD and TiDB on one NUMA node and TiKV on another NUMA node, and then adjust based on resource conditions.
Are there load balancers similar to HAProxy for 6 TiDB servers? ---- Yes, you can access the backend TiDB servers through HAProxy.
The CPU utilization is around 60%, and there is one PD and one TiKV deployed respectively. Why bind NUMA? —> Because two TiDB servers are deployed on one physical machine, and we also want to test how much performance improvement binding NUMA can bring to TiDB.