Can I turn off Prometheus, Grafana, and AlertManager components in a TiDB cluster?

translator_bot · June 25, 2024, 1:12pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb集群，我可以关掉prometheus与grafana、alertManager这些组件吗？

| username: 突破边界

[TiDB Usage Environment] Testing

[TiDB Version] 7.6.0
I found that TiDB’s memory usage is relatively high.

I want to turn off some less important monitoring components, such as Prometheus and Grafana, and only keep the dashboard that comes with the TiDB cluster. Is this feasible?

I tried to do this by using the command tiup cluster edit-config tidb-test, but after saving, I got the following prompt:

tiup is checking updates for component cluster ...timeout(2s)!
Starting component cluster: /root/.tiup/components/cluster/v1.14.1/tiup-cluster edit-config tidb-test
New topology could not be saved: immutable field changed: removed Monitors.0.Host with value '192.168.0.150', removed Monitors.0.ssh_port with value '22', removed Monitors.0.Port with value '11400', removed Monitors.0.ng_port with value '11480', removed Monitors.0.DeployDir with value '/home/tidb/tidb-deploy/prometheus-11400', removed Monitors.0.DataDir with value '/mnt/filemanage/tidb/tidb-data/prometheus-11400', removed Monitors.0.LogDir with value '/mnt/filemanage/tidb/tidb-deploy/prometheus-11400/log', removed Monitors.0.Arch with value 'amd64', removed Monitors.0.OS with value 'linux'
Do you want to continue editing? [Y/n]: (default=Y)

translator_bot · June 25, 2024, 1:12pm

| username: caiyfc | Original post link

If you are sure you want to turn off these monitoring components, you can directly scale down. The edit-config cannot directly scale down. If you need these monitoring components in the future, you can scale up again.

translator_bot · June 25, 2024, 1:12pm

| username: 这里介绍不了我 | Original post link

Sure, but it’s not recommended. If you want to remove it, then scale in using tiup-cluster-scale-in.

translator_bot · June 25, 2024, 1:12pm

| username: TiDBer_刚 | Original post link

These have little impact on performance. What monitoring would you use if you turned them off?

translator_bot · June 25, 2024, 1:12pm

| username: 有猫万事足 | Original post link

In that case, if a node goes down, who will proactively send you alerts? You can’t be watching this dashboard 24/7, right? Waking up to see a string of missed alerts feels really frustrating.

translator_bot · June 25, 2024, 1:12pm

| username: YuchongXU | Original post link

Sure, but it’s not recommended.

translator_bot · June 25, 2024, 1:12pm

| username: lemonade010 | Original post link

The test cluster can be removed, but try to keep it in production, as it is the most friendly way for future reviews.

translator_bot · June 25, 2024, 1:12pm

| username: TiDBer_QKDdYGfz | Original post link

I don’t think it’s necessary to shut it down. If it’s to avoid affecting the performance of the service itself, deploying separately is a good choice.

translator_bot · June 25, 2024, 1:12pm

| username: zhaokede | Original post link

If resources are not very tight, there’s no need to do this.
These tools are just little helpers for operations and maintenance.

translator_bot · June 25, 2024, 1:12pm

| username: 霸王龙的日常 | Original post link

There is no need to turn them off; the benefits of disabling these components are minimal.

translator_bot · June 25, 2024, 1:12pm

| username: TIDB-Learner | Original post link

If it has reached the point where you want to scale down the monitoring plugins to free up resources, you should actually talk to your boss about adding another server.

translator_bot · June 25, 2024, 1:12pm

| username: vincentLi | Original post link

Add more memory. Memory is so cheap now~

translator_bot · June 25, 2024, 1:12pm

| username: zhaokede | Original post link

It may not necessarily be memory; it could also be computing resources.

translator_bot · June 25, 2024, 1:12pm

| username: 舞动梦灵 | Original post link

This is a comprehensive monitoring system. If you don’t use this monitoring, you won’t have clear service data information for TiDB. Moreover, your issue is that TiDB, PD, TiKV, and TiMon are all on one machine? This is inherently unreasonable and will naturally consume a lot of memory. The normal plan should be one machine for TiMon, three machines for TiDB, three machines for PD, and three or more machines for TiKV.

translator_bot · June 25, 2024, 1:12pm

| username: zhanggame1 | Original post link

You can directly scale down, but some data on the dashboard is also obtained from Prometheus. If it’s not available, it won’t be displayed.

translator_bot · June 25, 2024, 1:12pm

| username: 呢莫不爱吃鱼 | Original post link

It is possible, but not recommended. It lacks the means and basis for querying issues.

translator_bot · August 9, 2024, 3:20am

| username: 健康的腰间盘 | Original post link

It is possible to scale down, but it is not highly recommended.

translator_bot · August 9, 2024, 3:20am

| username: tony5413 | Original post link

In the test environment, stop it first in case it is needed later. Use tiup cluster stop tidb-test -N xxx.xxx.xxx.xxx:3000 and tiup cluster stop tidb-test -N xxx.xxx.xxx.xxx:9093. You can try shutting down Grafana and Alertmanager, but Prometheus cannot be shut down as the dashboard retrieves data from Prometheus.

translator_bot · August 9, 2024, 3:20am

| username: Kongdom | Original post link

Sure, but it’s not recommended

translator_bot · August 9, 2024, 3:20am

| username: TiDBer_LM | Original post link

In a test environment, it doesn’t matter, but in a production environment, it’s better to keep it on. Otherwise, how will the operations team manage?