Pd Pod nslookup domain failed, domain field is abnormal

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd Pod nslookup domain failed,domain字段不正常

| username: TiDBer_3VVmUc3d

【TiDB Environment】Test Environment
【TiDB Version】Both 6.1 & 5.4
【Encountered Issue】When deploying the TiDB cluster, the basic-pd-0 Pod container fails and restarts after 30 seconds. Checking the logs reveals an issue with the domain field concatenation, with the following error:

** server can't find basic-pd-0.basic-pd-peer.tidb-cluster.svc.cluster.local.basic-pd-peer.tidb.cluster.svc: NXDOMAIN

nslookup domain basic-pd-0.basic-pd-peer.tidb-cluster.svc.cluster.local.basic-pd-peer.tidb-cluster.svc failed

Using startUpScriptVersion: "v1" results in a similar error:

domain resolve basic-pd-0.basic-pd-peer.tidb-cluster.svc.cluster.local.basic-pd-peer.tidb-cluster.svc no record return

This error likely originates from the source code in manager/member/template.go at line 120 in the pdStartScriptTpl command, where there is an issue with the concatenation of the domain variable. The expected concatenation result should be:

basic-pd-0.basic-pd-peer.tidb-cluster.svc # Correct result
basic-pd-0.basic-pd-peer.tidb-cluster.svc.cluster.local.basic-pd-peer.tidb-cluster.svc # Actual incorrect result

Is this a bug? Has anyone encountered this issue before?
【TiDB Operator Version】: 1.3.7
【K8s Version】: 1.20 & containerd 1.2.10

| username: xfworld | Original post link

What service does the current K8S environment rely on to support domains? Can it meet the basic requirements of the TiDB operator?

image

| username: yiduoyunQ | Original post link

Have you configured tc.clusterDomain? Can you provide the configuration of tc?

| username: TiDBer_3VVmUc3d | Original post link

The current K8S environment uses CoreDNS for support, and there should be no issues with the prepared environment content;
The tc.clusterDomain field is not set, and the tc configuration should be completely consistent with the quickstart.

I have now found the issue. In the manager/member/template.go file, in the pd startup script pdStartScriptTpl (after the recent GitHub update, it is in the tidb-operator/charts/tidb-cluster/templates/scripts/_start_pd.sh.tpl file), there is this line:

POD_NAME=${POD_NAME:-$HOSTNAME}

If kubelet does not inject $POD_NAME into the container, the value obtained by $HOSTNAME contains not only the container’s hostname but also many svc, namespace, etc., resulting in an incorrect domain. This line should be changed to:

POD_NAME=${POD_NAME:-$(hostname)}

After making this change, the result is now correct on my end.

| username: Min_Chen | Original post link

Hi, can you provide your tc definition yaml and pod yaml output?
Also, kubectl -nkube-system get cm/coredns -oyaml --export
and kubectl -ntidb-test exec -ti tidb4012-pd-0 – cat /etc/resolv.conf

| username: TiDBer_3VVmUc3d | Original post link

The tc yaml has not been modified, tidb-operator/examples/basic/tidb-cluster.yaml at master · pingcap/tidb-operator · GitHub
The pod’s yaml basic-pd-0.yaml (5.1 KB)
cm/coredns yaml cm-coredns.yaml (1.1 KB)

/etc/resolv.conf

    search tidb-cluster.svc.cluster.local svc.cluster.local cluster.local
    nameserver 22.0.0.10
    options ndots:5

I think we also need to pay attention to this, the result of the command in kubectl -n tidb-cluster exec -it basic-pd-0 -- sh

    / # echo $HOSTNAME
    basic-pd-0.basic-pd-peer.tidb-cluster.svc.cluster.local
    / # hostname
    basic-pd-0
| username: Min_Chen | Original post link

I couldn’t reproduce your issue locally.
I noticed the word “ali” in your YAML file. Could you please explain how your environment was created? How were the TiDB operator and TiDB cluster deployed?

| username: TiDBer_3VVmUc3d | Original post link

The k8s environment is Alibaba ACK managed version, version 1.20.4, and the deployment method follows the steps in the TiDB quick start guide.

| username: TiDBer_I9wlKOqz | Original post link

I also encountered this error, how did you solve it?