Can I turn off Prometheus, Grafana, and AlertManager components in a TiDB cluster?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb集群,我可以关掉prometheus与grafana、alertManager这些组件吗?

| username: 突破边界

[TiDB Usage Environment] Testing

[TiDB Version] 7.6.0
I found that TiDB’s memory usage is relatively high.

I want to turn off some less important monitoring components, such as Prometheus and Grafana, and only keep the dashboard that comes with the TiDB cluster. Is this feasible?

I tried to do this by using the command tiup cluster edit-config tidb-test, but after saving, I got the following prompt:

tiup is checking updates for component cluster ...timeout(2s)!
Starting component cluster: /root/.tiup/components/cluster/v1.14.1/tiup-cluster edit-config tidb-test
New topology could not be saved: immutable field changed: removed Monitors.0.Host with value '192.168.0.150', removed Monitors.0.ssh_port with value '22', removed Monitors.0.Port with value '11400', removed Monitors.0.ng_port with value '11480', removed Monitors.0.DeployDir with value '/home/tidb/tidb-deploy/prometheus-11400', removed Monitors.0.DataDir with value '/mnt/filemanage/tidb/tidb-data/prometheus-11400', removed Monitors.0.LogDir with value '/mnt/filemanage/tidb/tidb-deploy/prometheus-11400/log', removed Monitors.0.Arch with value 'amd64', removed Monitors.0.OS with value 'linux'
Do you want to continue editing? [Y/n]: (default=Y) 
| username: caiyfc | Original post link

If you are sure you want to turn off these monitoring components, you can directly scale down. The edit-config cannot directly scale down. If you need these monitoring components in the future, you can scale up again.

| username: 这里介绍不了我 | Original post link

Sure, but it’s not recommended. If you want to remove it, then scale in using tiup-cluster-scale-in.

| username: TiDBer_刚 | Original post link

These have little impact on performance. What monitoring would you use if you turned them off?

| username: 有猫万事足 | Original post link

In that case, if a node goes down, who will proactively send you alerts? You can’t be watching this dashboard 24/7, right? Waking up to see a string of missed alerts feels really frustrating.

| username: YuchongXU | Original post link

Sure, but it’s not recommended.

| username: lemonade010 | Original post link

The test cluster can be removed, but try to keep it in production, as it is the most friendly way for future reviews.

| username: TiDBer_QKDdYGfz | Original post link

I don’t think it’s necessary to shut it down. If it’s to avoid affecting the performance of the service itself, deploying separately is a good choice.

| username: zhaokede | Original post link

If resources are not very tight, there’s no need to do this.
These tools are just little helpers for operations and maintenance.

| username: 霸王龙的日常 | Original post link

There is no need to turn them off; the benefits of disabling these components are minimal.

| username: TIDB-Learner | Original post link

If it has reached the point where you want to scale down the monitoring plugins to free up resources, you should actually talk to your boss about adding another server.

| username: vincentLi | Original post link

Add more memory. Memory is so cheap now~

| username: zhaokede | Original post link

It may not necessarily be memory; it could also be computing resources.

| username: 舞动梦灵 | Original post link

This is a comprehensive monitoring system. If you don’t use this monitoring, you won’t have clear service data information for TiDB. Moreover, your issue is that TiDB, PD, TiKV, and TiMon are all on one machine? This is inherently unreasonable and will naturally consume a lot of memory. The normal plan should be one machine for TiMon, three machines for TiDB, three machines for PD, and three or more machines for TiKV.

| username: zhanggame1 | Original post link

You can directly scale down, but some data on the dashboard is also obtained from Prometheus. If it’s not available, it won’t be displayed.

| username: 呢莫不爱吃鱼 | Original post link

It is possible, but not recommended. It lacks the means and basis for querying issues.

| username: 健康的腰间盘 | Original post link

It is possible to scale down, but it is not highly recommended.

| username: tony5413 | Original post link

In the test environment, stop it first in case it is needed later. Use tiup cluster stop tidb-test -N xxx.xxx.xxx.xxx:3000 and tiup cluster stop tidb-test -N xxx.xxx.xxx.xxx:9093. You can try shutting down Grafana and Alertmanager, but Prometheus cannot be shut down as the dashboard retrieves data from Prometheus.

| username: Kongdom | Original post link

Sure, but it’s not recommended :thinking:

| username: TiDBer_LM | Original post link

In a test environment, it doesn’t matter, but in a production environment, it’s better to keep it on. Otherwise, how will the operations team manage?