What do the metrics in the left and right columns of the Region Heartbeat in the PD module of operation and maintenance monitoring mean?

translator_bot · June 21, 2024, 11:35pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 运维监控PD模块中 Region Heartbeat 左栏和右栏指标是何含义？

| username: TiDB_Paper

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.3.0
[Encountered Problem: Phenomenon and Impact]
Analyzing the region bar chart, it logically indicates a hot read/write issue. Therefore, I manually split it using the following SQL:
SPLIT TABLE business_bucket_directory_node INDEX uk_uuid BETWEEN (“0”) AND (“z”) REGIONS 6;

However, after splitting, the monitoring still shows a prominent issue. Does anyone know the meaning of the indicators on the left and right columns? I checked the documentation but couldn’t find a clear explanation.

[Resource Configuration] k8s: 64g/16c/4 nodes
[Attachment: Screenshot/Log/Monitoring]

translator_bot · June 21, 2024, 11:35pm

| username: Billmay表妹 | Original post link

Which metrics are you referring to? The image is not clear. It would be best to list the text.

translator_bot · June 21, 2024, 11:35pm

| username: TiDB_Paper | Original post link

As shown above, I marked it with a red box.

translator_bot · June 21, 2024, 11:35pm

| username: TiDB_Paper | Original post link

My guess: On the left is the region partition end, and on the right is the average read byte ratio. However, even after manually splitting, this kind of red bar spike still exists, which should still be hot writes.

translator_bot · June 21, 2024, 11:35pm

| username: redgame | Original post link

Absolutely correct guess.

translator_bot · June 21, 2024, 11:35pm

| username: TiDB_Paper | Original post link

However, after scattering here, the issue of hot writes still exists.

translator_bot · June 21, 2024, 11:35pm

| username: 有猫万事足 | Original post link

For observing hotspot issues, the official documentation always recommends using the traffic visualization chart in the Dashboard.

My traffic visualization chart doesn’t have large bright spots, but this monitoring chart shows a similar situation.

If I have to pinpoint an issue,
this uk_uuid should be an index of a character-type UUID column. UUIDs shouldn’t go up to ‘z’; the highest letter should be ‘f’. The upper and lower bounds here seem a bit inappropriate.

Theoretically, it should be from 00000000-0000-0000-0000-000000000000 to ffffffff-ffff-ffff-ffff-ffffffffffff.

The best practice also suggests not storing string-type UUIDs but using BINARY(16). Without setting the swap_flag, it’s not easy to form hotspots.

translator_bot · June 21, 2024, 11:35pm

| username: TiDB_Paper | Original post link

We have noticed the content in the official documentation;

We use TiDB to store file directory data, and recursive queries always start from the root directory. In the traffic visualization graph, there is always a bright block in the UUID range of the directory table.

We have some questions about this graph when observing hotspot monitoring

translator_bot · June 21, 2024, 11:35pm

| username: 有猫万事足 | Original post link

In that case, replica read should be more suitable for this situation.

`tidb_load_based_replica_read_threshold`

Try lowering this value from 1s. It might be more suitable for your situation.

translator_bot · June 21, 2024, 11:35pm

| username: xfworld | Original post link

Is it a data table hotspot or an index hotspot? This needs to be addressed specifically, otherwise no matter how much you tweak it, it won’t work…