Querying the Recovery Progress of the Third TiKV Replica

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV 第三个副本恢复进度查询

| username: residentevil

【TiDB Usage Environment】Production Environment
【TiDB Version】V6.1.7
【Encountered Problem: Problem Phenomenon and Impact】 After a TIKV node was damaged, a new node was registered. This process involves REGION data issues. Is it possible to obtain the current progress information (percentage) and the current recovery speed (MB) from a view or by calling the PD interface?

| username: hey-hoho | Original post link

In Grafana, under the TiKV panel, there are monitoring charts for regions and leaders. These two charts can be used to estimate the progress of region migration. The recovery speed can be referenced from the operator monitoring under the PD panel.

| username: zhanggame1 | Original post link

If the TiKV disks are the same size, eventually the three TiKV regions and leaders will have roughly the same amount. The speed is hard to estimate.

| username: Fly-bird | Original post link

Without exact data, you can estimate based on disk size.

| username: tidb菜鸟一只 | Original post link

Just check the balance of leaders and regions in Grafana. Once the leaders are balanced, it no longer affects the application. When the regions are balanced, availability is restored.

| username: chenhanneu | Original post link

In the PD monitoring category in Grafana:


During scaling up and down, there will be speed and time.

| username: residentevil | Original post link

In the TiKV-Details monitoring, I see there is a REGION count monitor. The calculation formula is: sum(tikv_raftstore_region_count{k8s_cluster=“$k8s_cluster”, tidb_cluster=“$tidb_cluster”, instance=~“$instance”, type=“region”}) by (instance). Where is the tikv_raftstore_region_count monitoring item obtained from?

| username: residentevil | Original post link

In the METRICS_SCHEMA database: select * from tikv_region_count where type='leader' and instance='10.xxxx'; Using this method, you can know the progress of the grayscale, right?

| username: 有猫万事足 | Original post link

Find the TiKV targets from the Prometheus targets, visit them by address, and you’ll find them with a search. If you haven’t installed these components before starting the integration, it will be too difficult.

| username: residentevil | Original post link

It seems that the data for tikv_raftstore_region_count can be obtained from both metrics and tikv_region_count. :+1: