After upgrading to version 7.5.1, the tidb_server_connections metric is abnormal

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 升级到7.5.1版本后tidb_server_connections指标获取异常

| username: dba-kit

As shown in the figure, the total number of connections is 338, but the value obtained through the /metrics interface is 1, which feels very much like the value of active connections.

| username: Daniel-W | Original post link

How about trying to curl from a different tidb-server node?

| username: Jiawei | Original post link

Or you can use netstat to filter and see which connections are actually established. It seems that directly checking would give you the connection count for the entire cluster, while the curl command below should only give you the connection count for a single instance. Alternatively, you can check which value is used for the connection count in the monitoring.

| username: buddyyuan | Original post link

Check if other metrics are normal? For example, tidb_server_tokens, which is used to monitor the number of active sessions.

| username: TiDBer_5cwU0ltE | Original post link

There might be some helpful information in the alert logs.

| username: redgame | Original post link

Did you configure this during the upgrade?

| username: dba-kit | Original post link

However, it’s strange that in the same cluster with multiple tidb-servers, one machine’s metrics are normal, showing only a default metric.

The other abnormal nodes also report metrics without the resource_group label.

| username: dba-kit | Original post link

After comparing the configurations, I found that the other tidb-server instances have the instance.tidb_force_priority parameter set, while the tidb-server with normal metrics does not have this configuration. I will remove it over the weekend and observe the results.

| username: aytrack | Original post link

This is a bug, and it is being tracked in this issue: Connection count metric can be less than the real value · Issue #51889 · pingcap/tidb · GitHub. The problem was introduced with the enhancement of graceful shutdown (server: enhance graceful stop by closing connections after finish the ongoing txn by july2993 · Pull Request #32111 · pingcap/tidb · GitHub). Adding related metrics to the resource group (metrics: add connection and fail metrics by `resource group name` by bufferflies · Pull Request #49424 · pingcap/tidb · GitHub) exposed this issue after monitoring was added.

| username: Jellybean | Original post link

The expert is very meticulous, a bug-catching master :+1:

| username: 田帅萌7 | Original post link

The same issue
Connection Count IP duplication

Temporary solution: tidb_server_connections{k8s_cluster=“$k8s_cluster”, tidb_cluster=“$tidb_cluster”, resource_group=“default”}

| username: dba-kit | Original post link

I directly used group by sum, but the data volume is still incorrect. However, there is only one line for each instance now.
sum(tidb_server_connections{cluster="$tidb_cluster"}) by (instance)

| username: dba-kit | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.