Is it normal for the discovery service to keep restarting in a TiDB cluster deployed with TiDB Operator?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 用TiDB Operator 部署的tidb集群,discovery一直重启正常吗

| username: daylight

[Test Environment for TiDB] Testing
[TiDB Version] v7.5.1
[Reproduction Path] After installation and deployment, discovery keeps restarting, but the cluster is usable, and creating databases and tables works fine.
[Encountered Problem: Phenomenon and Impact] After installation and deployment, discovery keeps restarting, but the cluster is usable, and creating databases and tables works fine.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
image
Logs

| username: zhaokede | Original post link

It’s not normal, check what the logs have output.

| username: daylight | Original post link

The image below is the discovery log. Do I need to check any other logs?

| username: lemonade010 | Original post link

  • discovery is a service used for inter-component discovery. Each TiDB cluster corresponds to a discovery Pod, which is used for components in the cluster to discover other already created components. This should not affect database usage, right?
| username: daylight | Original post link

It doesn’t affect usage, but will constant restarting be a potential issue?

| username: 有猫万事足 | Original post link

At least from the code, it exits upon receiving four types of signals.

Then, the logs clearly state that it received the SIGTERM signal.
And by default, kill sends this signal.
Now it’s unclear whether some script is killing it, or if there are other reasons involved.

| username: TiDBer_QYr0vohO | Original post link

It’s not normal. Try running kubectl describe pod <discovery> -n tidb-cluster-new to check.

| username: daylight | Original post link

It was just killed, and there’s no useful information.

| username: yiduoyunQ | Original post link

Check the operator logs.

| username: Jack-li | Original post link

Abnormal

| username: TiDBer_QYr0vohO | Original post link

Hmm, I can’t see any issues. Do your k8s nodes have enough resources now? Try changing the startup command of the deployment pod to block it with tail -f, then exec into the pod and manually execute the discovery-related commands to see if you can find any useful information.

| username: daylight | Original post link

tidb-controller-manager logs

| username: yiduoyunQ | Original post link

Looking at the logs is not triggered internally by the Operator. Start investigating from the k8s level, such as OOM kill, node scheduling, etc.

| username: zhaokede | Original post link

Is it resolved?

| username: 友利奈绪 | Original post link

Try restarting.

| username: kkpeter | Original post link

This thing doesn’t seem to be useful and doesn’t affect the cluster.

| username: 小龙虾爱大龙虾 | Original post link

Doesn’t this log show that it was killed from the k8s level?

| username: daylight | Original post link

No.

| username: daylight | Original post link

Restarted, didn’t work.

| username: daylight | Original post link

Currently, we are looking for the reason for the kill.