Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIDB 如何通过工具获取到底层rocksdb的监控数据
【TiDB Usage Environment】Production Environment
【TiDB Version】v6.1.7.1
【Encountered Problem: Problem Phenomenon and Impact】How to obtain the monitoring data of the underlying RocksDB through tools in TiDB【Can tikv-ctl support it?】, including the structure of each LEVEL and the following monitoring:
rocksdb.write.wal
rocksdb.block.cache.add
In Grafana, tsp-prod-tidb-cluster-TiKV-Details has detailed RocksDB monitoring.
The information here is still not very complete.
The address tikvip:9100/metrics contains all the data collected by Prometheus monitoring. You can extract the data you need and create a monitoring dashboard in Grafana.
There is a question: If there is an uneven distribution of REGIONS, is there a way to trigger a rebalance through a command?
Take a screenshot to see the region distribution of each KV.
There is also a lot of KV monitoring information at tikv:20180/metrics.
You can do it manually using the pdctl command. For more details, you can check the documentation:
pd-ctl can see the hot STORE, and there are many hot REGIONS in this STORE. It seems that it still can’t BALANCE a few REGIONS
. Is there a systematic command to re-BALANCE the REGIONS in this STORE?
Currently, the cluster has a total of 25 STORES, but only 2 STORES have high load. The PD-CTL command also shows that there is a HOT spot issue with writes.
Check the scheduler
command in pdctl.
After running scheduler show
, it returns:
[
“balance-hot-region-scheduler”,
“balance-leader-scheduler”,
“balance-region-scheduler”,
“split-bucket-scheduler”
]
I found some PD-related HOT-REGION scheduling parameters, and I will try to modify them to see the effect.
I searched online, and for situations with hotspot STORE [a relatively large number of hotspot REGIONS], if it’s an existing cluster, you can only first alleviate it by adjusting the PD scheduling parameters.