I Need Advice for Optimizing TiDB Performance for Large Scale Workloads

roberrttt · July 28, 2024, 12:11pm

Hello there,

I am managing a TiDB cluster and have been noticing some performance bottlenecks as our data and workload scales up. We have been using TiDB for about a year; and its been a fantastic experience overall. Although; as our data size has grown; so have the complexities in maintaining optimal performance.

We are running a three node TiDB cluster with TiKV and PD also distributed across three nodes.

The total data size is approaching 2TB; with significant write and read operations.
Our workload consists mainly of OLTP transactions; but we also run complex analytical queries occasionally.

What are the best practices for indexing in TiDB to handle both high write throughput and complex read queries? Are there specific index types or configurations that are more effective for larger datasets?

How can we optimize our schema to reduce latency and improve performance? Are there specific design patterns or anti-patterns to be aware of when scaling TiDB?

Which metrics should we focus on to diagnose and address performance issues? Also; what tools or techniques do you recommend for monitoring and tuning TiDB performance?

Also; I have gone through this post; https://ask.pingcap.com/t/subquery-execution-plan-blue-prism/ which definitely helped me out a lot.

As we consider scaling our cluster; what hardware specifications are crucial for maintaining performance? Are there particular configurations that have worked well for others in the community?

Thank you in advance for your help and assistance.

CheeseWong · July 29, 2024, 2:50am

Hi @roberrttt . Thanks for liking TiDB. And for your question, let me give you some suggestions.

For the complex analytical queries: we have a component named TiFlash, it’s a column-based storage. Plus with TiKV, then we call whole of TiDB as an HTAP database, which means OLTP + OLAP. FYI:
- Quick start with HTAP | PingCAP Docs
- TiFlash Overview | PingCAP Docs
High write throughput: Due to TiDB’s distributed architect, you can add some tidb nodes to raise the hardware’s write limit. At the same time, if the bottleneck happened in the table, we suggest to use AUTO_RANDOM instead of AUTO_INCREMENT as the PK’s generation function. FYI:
- AUTO_RANDOM | PingCAP Docs
- Troubleshoot Hotspot Issues | PingCAP Docs
For diagnose and address performance issues, FYI:
- AUTO_RANDOM | PingCAP Docs
- Also for the whole section of Troubleshoot documents.
  
  image548×634 25.9 KB

Happy hacking!