Suggestion to Add Monitoring for Viewing Detailed Internal Resource Usage of TiDB and TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 建议增加查看tidb、tikv内部资源使用详情的监控

| username: 像风一样的男子

I have encountered many issues where TiDB memory usage increases and it takes a lot of time to troubleshoot. Is it possible to add a monitoring feature that includes the memory and CPU usage of TiDB and TiKV? The dimensions could be SQL, cache, etc. This way, when memory or CPU usage increases, we can immediately identify which part is causing the problem.

| username: Fly-bird | Original post link

This is a good idea, the official team can consider it.

| username: TiDBer_周武 | Original post link

Currently occupying a large amount of memory, indeed should add more details.

| username: ShawnYan | Original post link

The flame graph in the dashboard, trace can also be checked, and some debug information can be downloaded.

| username: 像风一样的男子 | Original post link

What you mentioned is too abstract. Wouldn’t it be better to have a place where you can see it directly? This feature is far ahead compared to competitors.

| username: xfworld | Original post link

Prometheus can meet the monitoring and troubleshooting needs in regular situations. For the scenario you mentioned, you can refer to the flame graph or trace suggested by ShawnYan for further tracking.

| username: 像风一样的男子 | Original post link

Uh, this is a product requirement I proposed to the vendor!

| username: xfworld | Original post link

Alright, please try to be as detailed as possible, ideally specifying the exact requirements for each feature.

Otherwise, it might be overlooked, as the value for the product would be too low, and there are too many related factors to consider…

For example, whether the expectation is for feature enhancement, integration of overall capabilities, or introducing a new functional perspective…

| username: ShawnYan | Original post link

Specific monitoring metrics and the display format should be further detailed to make it easier for the product team to take notice of this suggestion.

| username: residentevil | Original post link

For example, in the sys schema of the native MySQL 8.0 version, there are the following four views, each of which can view the current instance memory usage details, which is particularly helpful for analyzing problems:

  • memory_by_host_by_current_bytes: Memory usage based on host
  • memory_by_thread_by_current_bytes: Memory usage based on thread
  • memory_by_user_by_current_bytes: Memory usage based on user
  • memory_global_total: Total memory usage of the entire MySQLD process
| username: TiDBer_vfJBUcxl | Original post link

This requirement is practical.

| username: ajin0514 | Original post link

Good idea, looking forward to official support.