Deployment Plan for Multi-Cluster Prometheus

translator_bot · June 20, 2024, 11:27pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 多集群promethus的部署方案

| username: 考试没答案

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact] Deployment plan for multiple clusters with Prometheus. How to achieve automatic scaling of nodes
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 20, 2024, 11:27pm

| username: Kongdom | Original post link

Automatic scaling? Isn’t it all manual scaling?

translator_bot · June 20, 2024, 11:27pm

| username: 考试没答案 | Original post link

What I mean is: for example, I have 5 clusters that all need to be monitored. I want to use a single Prometheus setup. How can I automatically ensure that when nodes are scaled up or down, Prometheus can detect this and automatically add or remove the monitored nodes?

translator_bot · June 20, 2024, 11:27pm

| username: chenhanneu | Original post link

Each cluster has its own Prometheus, which is easier to manage. Using a single Prometheus for the entire system means that every time you scale up or down, the configuration file is regenerated, and information from other clusters is lost.

translator_bot · June 20, 2024, 11:27pm

| username: 考试没答案 | Original post link

In other words, it is best for each cluster to have its own Prometheus. With 5 clusters, there will be 5 Prometheus instances to meet the ability to automatically monitor newly added nodes and automatically decommission scaled-down nodes.

translator_bot · June 20, 2024, 11:27pm

| username: onlyacat | Original post link

It’s best not to, otherwise you’ll have to handle it manually. Having one Prometheus for each cluster should be better.

translator_bot · June 20, 2024, 11:27pm

| username: 考试没答案 | Original post link

Yes, I have discovered this issue. So, are there also 5 alerts in the alert system? And are there 5 in Grafana as well?

translator_bot · June 20, 2024, 11:27pm

| username: 小龙虾爱大龙虾 | Original post link

Got it, please provide the Chinese text you need translated.

translator_bot · June 20, 2024, 11:27pm

| username: 像风一样的男子 | Original post link

Wouldn’t it be clearer to look at them separately? Integrating several sets into one Grafana makes it difficult to distinguish issues with individual nodes.

translator_bot · June 20, 2024, 11:27pm

| username: onlyacat | Original post link

These monitoring nodes don’t consume much resources. At worst, just change the ports and put them all on one machine.

Otherwise, if a large amount of monitoring information from one cluster is reported, causing the monitoring of other clusters to fail, wouldn’t that be even more troublesome?