[Reproduction Path] Set 100 warehouses and 100TP during benchmark testing, without adding AP and without limiting maximum throughput.
[Encountered Problem: Phenomenon and Impact] Under the same settings, the pg streaming replication cluster can reach 400,000 tpmC, and ob can reach 240,000 after tuning, but TiDB can only reach 40,000. After applying haproxy, it only reaches 60,000. Therefore, we suspect that there might be an issue on our side or if there are other tuning methods available.
[Resource Configuration] Three machines, each with one TiKV and one TiFlash, both bound to one NUMA core.
[Attachment: Screenshot/Logs/Monitoring] Attached is a screenshot of the haproxy configuration.
Testing tpmc on TiDB indeed doesn’t show much advantage. Check the monitoring for TiDB CPU utilization, thread-cpu in tikv-detail, raft propose monitoring, and disk performance monitoring in the overview. Additionally, take a look at slow SQL to see if there are optimization opportunities.
We used benchmarksql for testing and did not encounter any significant slow queries. I would like to ask if there are any tuning suggestions or operational recommendations for the YAML configuration. Thank you!
To ensure fairness in testing with similar products, our setup involves having one TiKV and one TiFlash on each of the three machines, each occupying half of the system resources, bound to independent NUMA cores, and each having a TiDB compute node. Currently, the throughput of TiDB indeed cannot be increased. We are unsure if this is due to the deployment mode or configuration issues. Thank you! Additionally, if using the native BenchmarkSQL, are there any optimizations for TPC-C in TiDB that we could try applying?
The official documentation does not provide specific recommended settings or optimizations for TPC-C. Could you please check the related monitoring metrics mentioned earlier?
Hello, we have collected some monitoring information from the cluster and would like to know if there are any optimization suggestions. Thank you! (Data after 21:00 is not of reference value as the process was completed by then)
It shouldn’t be the case. Have you tried the tpcc test that comes with tiup? Our cluster with 80 cores, 128GB RAM, and mechanical hard drives exceeds 60,000.