Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 用TiDB Operator 部署的tidb集群,discovery一直重启正常吗
[Test Environment for TiDB] Testing
[TiDB Version] v7.5.1
[Reproduction Path] After installation and deployment, discovery keeps restarting, but the cluster is usable, and creating databases and tables works fine.
[Encountered Problem: Phenomenon and Impact] After installation and deployment, discovery keeps restarting, but the cluster is usable, and creating databases and tables works fine.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

Logs
It’s not normal, check what the logs have output.
The image below is the discovery log. Do I need to check any other logs?
It doesn’t affect usage, but will constant restarting be a potential issue?
At least from the code, it exits upon receiving four types of signals.
Then, the logs clearly state that it received the SIGTERM signal.
And by default, kill sends this signal.
Now it’s unclear whether some script is killing it, or if there are other reasons involved.
It’s not normal. Try running kubectl describe pod <discovery> -n tidb-cluster-new
to check.
It was just killed, and there’s no useful information.
Hmm, I can’t see any issues. Do your k8s nodes have enough resources now? Try changing the startup command of the deployment pod to block it with tail -f
, then exec into the pod and manually execute the discovery-related commands to see if you can find any useful information.
tidb-controller-manager logs
Looking at the logs is not triggered internally by the Operator. Start investigating from the k8s level, such as OOM kill, node scheduling, etc.
This thing doesn’t seem to be useful and doesn’t affect the cluster.
Doesn’t this log show that it was killed from the k8s level?
Currently, we are looking for the reason for the kill.