Throughput of TiDB Cluster Does Not Improve During Benchmark Testing

translator_bot · June 22, 2024, 1:44pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: benchmark测试时TiDB集群吞吐上不去

| username: tiertie

[TiDB Usage Environment] CentOS 7.9
[TiDB Version] 6.5

[Reproduction Path] Set 100 warehouses and 100TP during benchmark testing, without adding AP and without limiting maximum throughput.

[Encountered Problem: Phenomenon and Impact] Under the same settings, the pg streaming replication cluster can reach 400,000 tpmC, and ob can reach 240,000 after tuning, but TiDB can only reach 40,000. After applying haproxy, it only reaches 60,000. Therefore, we suspect that there might be an issue on our side or if there are other tuning methods available.

[Resource Configuration] Three machines, each with one TiKV and one TiFlash, both bound to one NUMA core.

[Attachment: Screenshot/Logs/Monitoring] Attached is a screenshot of the haproxy configuration.

translator_bot · June 22, 2024, 1:44pm

| username: h5n1 | Original post link

Testing tpmc on TiDB indeed doesn’t show much advantage. Check the monitoring for TiDB CPU utilization, thread-cpu in tikv-detail, raft propose monitoring, and disk performance monitoring in the overview. Additionally, take a look at slow SQL to see if there are optimization opportunities.

translator_bot · June 22, 2024, 1:44pm

| username: tiertie | Original post link

We used benchmarksql for testing and did not encounter any significant slow queries. I would like to ask if there are any tuning suggestions or operational recommendations for the YAML configuration. Thank you!

translator_bot · June 22, 2024, 1:44pm

| username: Wind-Gone | Original post link

To ensure fairness in testing with similar products, our setup involves having one TiKV and one TiFlash on each of the three machines, each occupying half of the system resources, bound to independent NUMA cores, and each having a TiDB compute node. Currently, the throughput of TiDB indeed cannot be increased. We are unsure if this is due to the deployment mode or configuration issues. Thank you! Additionally, if using the native BenchmarkSQL, are there any optimizations for TPC-C in TiDB that we could try applying?

translator_bot · June 22, 2024, 1:44pm

| username: h5n1 | Original post link

The official documentation does not provide specific recommended settings or optimizations for TPC-C. Could you please check the related monitoring metrics mentioned earlier?

translator_bot · June 22, 2024, 1:44pm

| username: tiertie | Original post link

Hello, we have collected some monitoring information from the cluster and would like to know if there are any optimization suggestions. Thank you! (Data after 21:00 is not of reference value as the process was completed by then)

translator_bot · June 22, 2024, 1:44pm

| username: magic | Original post link

It shouldn’t be the case. Have you tried the tpcc test that comes with tiup? Our cluster with 80 cores, 128GB RAM, and mechanical hard drives exceeds 60,000.

translator_bot · June 22, 2024, 1:44pm

| username: magic | Original post link

I feel that we still need to optimize the CPU on TiKV, as several metrics on the monitoring have reached the warning line.

translator_bot · June 22, 2024, 1:44pm

| username: h5n1 | Original post link

Refer to this document to increase the number of raftstore, grpc, and apply scheduler threads:

translator_bot · June 22, 2024, 1:44pm

| username: caiyfc | Original post link

Is the network gigabit bandwidth? It looks like the network speed is stuck at 100 Mbps.

translator_bot · June 22, 2024, 1:44pm

| username: tiertie | Original post link

We even have a 10-gigabit network, hahaha.

translator_bot · June 22, 2024, 1:44pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.