Understanding and Tuning of Monitoring

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于监控的理解以及调优

| username: zqk_zqk

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] Data collection (executing select and insert operations), many times, the largest table level is tens of millions
[Encountered Problem: Phenomenon and Impact] High latency, very slow read and write speed. Since we just started using TiDB, we have learned a lot of theoretical knowledge but have no concept of data scale. Hope experts can explain what data scale is considered large.
[Resource Configuration] pd3 16 cores 8g tikv3 16 cores 16g tidb*3 one with 64 cores 128g, others 16 cores 16g
[Attachments: Screenshots/Logs/Monitoring]

| username: 我是咖啡哥 | Original post link

Your TiKV memory is a bit small. For tables with tens of millions of rows, first check the slow SQL. See where the bottleneck is.

| username: 近墨者zyl | Original post link

From the screenshot, the overall duration is very high, so it’s not an issue with a few slow SQL queries, but rather a problem with the overall resource configuration of the cluster. The TiKV memory is very small, and the TiDB server is experiencing a bottleneck effect. You should first increase the physical resources. Your system can handle 100 OLTP concurrent connections for now.

| username: Jiawei | Original post link

Check the memory usage of TiKV and the slow query ranking in the dashboard. Click in to see where exactly the query is slow.