Deployment Plan for Multi-Cluster Prometheus

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 多集群promethus的部署方案

| username: 考试没答案

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact] Deployment plan for multiple clusters with Prometheus. How to achieve automatic scaling of nodes
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: Kongdom | Original post link

:flushed: Automatic scaling? Isn’t it all manual scaling?

| username: 考试没答案 | Original post link

What I mean is: for example, I have 5 clusters that all need to be monitored. I want to use a single Prometheus setup. How can I automatically ensure that when nodes are scaled up or down, Prometheus can detect this and automatically add or remove the monitored nodes?

| username: chenhanneu | Original post link

Each cluster has its own Prometheus, which is easier to manage. Using a single Prometheus for the entire system means that every time you scale up or down, the configuration file is regenerated, and information from other clusters is lost.

| username: 考试没答案 | Original post link

In other words, it is best for each cluster to have its own Prometheus. With 5 clusters, there will be 5 Prometheus instances to meet the ability to automatically monitor newly added nodes and automatically decommission scaled-down nodes.

| username: onlyacat | Original post link

It’s best not to, otherwise you’ll have to handle it manually. Having one Prometheus for each cluster should be better.

| username: 考试没答案 | Original post link

Yes, I have discovered this issue. So, are there also 5 alerts in the alert system? And are there 5 in Grafana as well?

| username: 小龙虾爱大龙虾 | Original post link

Got it, please provide the Chinese text you need translated.

| username: 像风一样的男子 | Original post link

Wouldn’t it be clearer to look at them separately? Integrating several sets into one Grafana makes it difficult to distinguish issues with individual nodes.

| username: onlyacat | Original post link

These monitoring nodes don’t consume much resources. At worst, just change the ports and put them all on one machine.

Otherwise, if a large amount of monitoring information from one cluster is reported, causing the monitoring of other clusters to fail, wouldn’t that be even more troublesome?

| username: 考试没答案 | Original post link

I will still deploy separately. I encountered the above problem and tested it. It is the same situation as you mentioned.

| username: chris-zhang | Original post link

Why gather together to watch? Wouldn’t that be more inconvenient?

| username: jetora | Original post link

You can use confd to dynamically generate Prometheus configuration files.

| username: 源de爸 | Original post link

Having one TiDB and one Prometheus is indeed quite a waste of resources.

| username: DBAER | Original post link

This is the best.

| username: 考试没答案 | Original post link

That’s right, it is quite resource-intensive, but it makes maintenance easier.

| username: TiDBer_rvITcue9 | Original post link

Learned a lot, thanks for sharing.

| username: 健康的腰间盘 | Original post link

Resources and convenience are directly proportional.

| username: TiDBer_QYr0vohO | Original post link

Take a look at Prometheus federation clusters, which might meet your needs. 联邦集群 | prometheus-book

| username: TiDBer_嘎嘣脆 | Original post link

It is more appropriate to monitor each set separately.