TiDB Dashboard Debugging Error

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB dashboard 调试错误

| username: GreenGuan

Could you please explain why the CPU of the TiKV node reports an error and cannot be viewed during advanced debugging and manual analysis of this page? I checked and found that the instance does not have files similar to *.proto.

| username: WalterWj | Original post link

:thinking: Is there anything in the /tmp directory? :thinking: Take a look. How did you check this instance?

| username: GreenGuan | Original post link

Manual analysis, nothing in tmp

image

| username: GreenGuan | Original post link

Another piece of information to add: I am deploying TiKV with multiple instances on a single machine.

I also checked other clusters, and the cluster version is 6.5.9. This time, there was a Heap error.

| username: xfworld | Original post link

That depends on how the deployment path is configured for storing such temporary files. By default, it points to /tmp.

| username: WalterWj | Original post link

This might be because a manual profiling was initiated on the dashboard (or directly called the /debug/pprof/profile interface of the TiKV status port) during the continuous profiling operation, causing the CPU profiling initiated by continuous profiling to fail (the same process can only handle one CPU profiling request at a time).

In simple terms, did you repeatedly call for flame graph capture? :thinking: Did you enable continuous analysis?

| username: WalterWj | Original post link

You can check the logs for these two keywords in TiKV to see if there are any instances where the profile is being pulled up simultaneously. :thinking:

| username: WalterWj | Original post link

Or check the monitoring at this location:

path=/debug/pprof/profile

| username: GreenGuan | Original post link

In version 7.5.1, if continuous analysis is enabled, manual analysis cannot be enabled. After continuous analysis is turned off, the corresponding profile is not cleaned up properly, causing the manual analysis of the CPU to fail (I compared it with the first continuous analysis where the CPU could be analyzed normally). The awkward part is that this result cannot be manually deleted.

| username: GreenGuan | Original post link

There is no relevant directory.

| username: WalterWj | Original post link

I don’t think the leftover files should be an issue. It seems like there are two flame graph capturing commands running. There might already be a flame graph capturing process running, and manually starting another one caused the error.

| username: GreenGuan | Original post link

After restarting TiDB, the issue did not reoccur.

| username: WalterWj | Original post link

So restarting can solve 80% of the problems…

| username: WinterLiu | Original post link

It turns out that restarting really can solve the problem :grin: