TiDB Pod Not Created Properly, PD and TiKV Can Be Created Normally

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb Pod 未正常创建 pd, tikv能正常创建

| username: TiDBer_6Lizki07

Deploying the tidb-cluster, pd, tikv, and discovery can all be created normally, but the tidb pod is not created. There are no abnormal logs, and there is no information in the events. I don’t know what the reason is.

Here are the relevant logs:

  • kubectl get pod
    NAME READY STATUS RESTARTS AGE
    advanced-tidb-discovery-6c65bf49fb-lwgqg 1/1 Running 0 8m38s
    advanced-tidb-pd-0 1/1 Running 0 8m38s
    advanced-tidb-pd-1 1/1 Running 0 8m38s
    advanced-tidb-tikv-0 1/1 Running 0 8m13s
    advanced-tidb-tikv-1 1/1 Running 0 8m13s
    tidb-controller-manager-75859464b-vb8x8 1/1 Running 0 131m

  • kubectl get deployment
    NAME READY UP-TO-DATE AVAILABLE AGE
    advanced-tidb-discovery 1/1 1 1 8m38s
    tidb-controller-manager 1/1 1 1 114m

  • kubectl get tidbclusters -n frs-dev
    NAME READY PD STORAGE READY DESIRE TIKV STORAGE READY DESIRE TIDB READY DESIRE AGE
    advanced-tidb False pingcap/pd:v6.1.0 10Gi 2 2 50Gi 2 2 2 8m39s

  • kubectl get statefulsets -n frs-dev
    NAME READY AGE
    advanced-tidb-pd 2/2 8m39s
    advanced-tidb-tikv 2/2 8m14s

Here is the yaml configuration content of the tidb-cluster:

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: advanced-tidb
  namespace: frs-dev

spec:
  #######################
  # Basic Configuration #
  #######################

  ## TiDB cluster version
  version: "v6.1.0"

  ## Time zone of TiDB cluster Pods
  timezone: UTC

  configUpdateStrategy: RollingUpdate

  helper:
    image: alpine:3.16.0
  pvReclaimPolicy: Retain

  nodeSelector:
    project: RECONPLFM

  ## Tolerations are applied to TiDB cluster pods, allowing (but do not require) pods to be scheduled onto nodes with matching taints.
  ## This cluster-level `tolerations` only takes effect when no component-level `tolerations` are set.
  ## e.g. if `pd.tolerations` is not empty, `tolerations` here will be ignored.
  ## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  tolerations:
    - effect: NoSchedule
      key: RECONPLFM
      operator: Equal
    # value: RECONPLFM

  enableDynamicConfiguration: true

  pd:
    ##########################
    # Basic PD Configuration #
    ##########################

    ## Base image of the component
    baseImage: pingcap/pd

    ## pd-server configuration
    ## Ref: https://docs.pingcap.com/tidb/stable/pd-configuration-file
    config: |
      [dashboard]
        internal-proxy = true

    ## The desired replicas
    replicas: 2

    ## max inprogress failover PD pod counts
    maxFailoverCount: 0

    ## describes the compute resource requirements and limits.
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
    #   cpu: 1000m
    #   memory: 1Gi
      storage: 10Gi
    # limits:
    #   cpu: 2000m
    #   memory: 2Gi

    mountClusterClientSecret: true

    ## The storageClassName of the persistent volume for PD data storage.
    storageClassName: "cna-reconplfm-dev-nas"
  tidb:
    ############################
    # Basic TiDB Configuration #
    ############################

    ## Base image of the component
    baseImage: pingcap/tidb

    ## tidb-server Configuration
    ## Ref: https://docs.pingcap.com/tidb/stable/tidb-configuration-file
    config: |
      [performance]
        tcp-keep-alive = true

    ## The desired replicas
    replicas: 2

    ## max inprogress failover TiDB pod counts
    maxFailoverCount: 0

    service:
      type: ClusterIP
    storageClassName: "cna-reconplfm-dev-nas"

  tikv:
    ############################
    # Basic TiKV Configuration #
    ############################

    ## Base image of the component
    baseImage: pingcap/tikv

    ## tikv-server configuration
    ## Ref: https://docs.pingcap.com/tidb/stable/tikv-configuration-file
    config: |
      log-level = "info"

    ## The desired replicas
    replicas: 2

    ## max inprogress failover TiKV pod counts
    maxFailoverCount: 0

    ## describes the compute resource requirements.
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
    #   cpu: 1000m
    #   memory: 1Gi
      storage: 50Gi
    mountClusterClientSecret: true
    storageClassName: "cna-reconplfm-dev-nas"
| username: TiDBer_6Lizki07 | Original post link

The issue has been roughly pinpointed. By checking the logs of tidb-controller-manager, it indicates that it is waiting for the leader node election of PD. Then, by examining the cluster details, it was found that the health of one tidb-pd node is false.

The corresponding error message in the tidb-pd node log is:
[ERROR] [etcdutil.go:126] ["load from etcd meet error"] [key=/pd/7160589068699979798/config] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]

However, the specific cause of this error message is unclear.

| username: xfworld | Original post link

The core component ETCD in PD is not working properly.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.