How to Use Tools to Obtain Monitoring Data from the Underlying RocksDB in TiDB

translator_bot · June 21, 2024, 6:09pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB 如何通过工具获取到底层rocksdb的监控数据

| username: residentevil

【TiDB Usage Environment】Production Environment
【TiDB Version】v6.1.7.1
【Encountered Problem: Problem Phenomenon and Impact】How to obtain the monitoring data of the underlying RocksDB through tools in TiDB【Can tikv-ctl support it?】, including the structure of each LEVEL and the following monitoring:
rocksdb.write.wal
rocksdb.block.cache.add

translator_bot · June 21, 2024, 6:09pm

| username: 像风一样的男子 | Original post link

In Grafana, tsp-prod-tidb-cluster-TiKV-Details has detailed RocksDB monitoring.

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

The information here is still not very complete.

translator_bot · June 21, 2024, 6:09pm

| username: 像风一样的男子 | Original post link

The address tikvip:9100/metrics contains all the data collected by Prometheus monitoring. You can extract the data you need and create a monitoring dashboard in Grafana.

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

I’ll give it a try.

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

There is a question: If there is an uneven distribution of REGIONS, is there a way to trigger a rebalance through a command?

translator_bot · June 21, 2024, 6:09pm

| username: 像风一样的男子 | Original post link

Take a screenshot to see the region distribution of each KV.

translator_bot · June 21, 2024, 6:09pm

| username: 像风一样的男子 | Original post link

There is also a lot of KV monitoring information at tikv:20180/metrics.

translator_bot · June 21, 2024, 6:09pm

| username: 大飞哥online | Original post link

You can do it manually using the pdctl command. For more details, you can check the documentation:

translator_bot · June 21, 2024, 6:09pm

| username: Fly-bird | Original post link

pdctl

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

pd-ctl can see the hot STORE, and there are many hot REGIONS in this STORE. It seems that it still can’t BALANCE a few REGIONS . Is there a systematic command to re-BALANCE the REGIONS in this STORE?

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

Currently, the cluster has a total of 25 STORES, but only 2 STORES have high load. The PD-CTL command also shows that there is a HOT spot issue with writes.

translator_bot · June 21, 2024, 6:09pm

| username: 大飞哥online | Original post link

Check the scheduler command in pdctl.

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

After running scheduler show, it returns:
[
“balance-hot-region-scheduler”,
“balance-leader-scheduler”,
“balance-region-scheduler”,
“split-bucket-scheduler”
]

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

I found some PD-related HOT-REGION scheduling parameters, and I will try to modify them to see the effect.

translator_bot · June 21, 2024, 6:09pm

| username: 大飞哥online | Original post link

Okay, got it.

translator_bot · June 21, 2024, 6:09pm

| username: residentevil | Original post link

I searched online, and for situations with hotspot STORE [a relatively large number of hotspot REGIONS], if it’s an existing cluster, you can only first alleviate it by adjusting the PD scheduling parameters.