Failed to deploy TidbCluster, indicating TiKVStoreNotUp

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TidbCluster集群部署失败, 提示TiKVStoreNotUp

| username: TiDBer_6Lizki07

[TiDB Usage Environment] Test Environment
[TiDB Version] v6.1.0
[Encountered Issue] Cluster deployment failed
[Issue Phenomenon and Impact]
Deploying the cluster, it keeps prompting “TiKV store(s) are not up”. The TiKV pod is in the running state.

+ kubectl get pod
NAME                                                                   READY   STATUS             RESTARTS   AGE
advanced-tidb-discovery-6c65bf49fb-kmmmn                          1/1     Running            0          62m
advanced-tidb-pd-0                                                1/1     Running            0          62m
advanced-tidb-tikv-0                                              1/1     Running            0          62m
advanced-tidb-tikv-1                                              1/1     Running            0          22m
advanced-tidb-tikv-2                                              1/1     Running            0          22m

Checked the logs inside the TiKV pod and found no abnormal information.

The logs of the tidb-controller-manager keep printing the following information:

I1101 08:44:09.534955       1 tikv_member_manager.go:808] TiKV of Cluster frs-dev/advanced-tidb not bootstrapped yet
I1101 08:44:09.555285       1 tikv_member_manager.go:906] TiKV of Cluster frs-dev/advanced-tidb is not bootstrapped yet, no need to set store labels
I1101 08:44:09.555981       1 tidb_cluster_controller.go:127] TidbCluster: frs-dev/advanced-tidb, still need sync: TidbCluster: [frs-dev/advanced-tidb], waiting for TiKV cluster running, requeuing

Does anyone know what the reason might be?

| username: xfworld | Original post link

TiKV has not set labels

TiKV of Cluster frs-dev/advanced-tidb is not bootstrapped yet, no need to set store labels
need to set store labels

| username: TiDBer_6Lizki07 | Original post link

How do I set these labels?

I see the log says “tidb is not bootstrapped yet, no need to set store labels,” so there’s no need to set labels.

| username: xfworld | Original post link

Which access point should I try?

| username: TiDBer_6Lizki07 | Original post link

Are you referring to accessing the database? It’s not accessible because the TiDB service hasn’t started yet. The deployment progress is stuck here, and the subsequent TiDB service hasn’t been created.

+ kubectl get tidbclusters -n frs-dev
NAME            READY   PD                  STORAGE   READY   DESIRE   TIKV   STORAGE   READY   DESIRE   TIDB   READY   DESIRE   AGE
advanced-tidb   False   pingcap/pd:v6.1.0   10Gi      1       1               50Gi      3       3                       2        62m
| username: xfworld | Original post link

Check the scheduling events, where did it hang?

| username: TiDBer_6Lizki07 | Original post link

Uh, what do you mean by scheduling events? How do you check them?

| username: xfworld | Original post link

You’ve already used K8S, shouldn’t you understand the process of how a pod is created?

In other words, you should at least check the logs for pod creation, right? Why does it keep getting stuck here?

| username: TiDBer_6Lizki07 | Original post link

I have checked the logs of the pods, and there is nothing abnormal. Only the tidb-controller-manager indicates that the TiKV cluster is not up.

The statefulsets for TiDB have not been created. I think it might be because the TiDB Operator checks that TiKV is not ready, so it hasn’t proceeded to create the next pod. But now, I don’t know why it thinks the TiKV cluster is not normal. I have checked the logs of several TiKV pods, and they are all normal.

+ kubectl get statefulsets -n frs-dev
NAME                 READY   AGE
advanced-tidb-pd     1/1     62m
advanced-tidb-tikv   3/3     62m
| username: xfworld | Original post link

Prioritize checking the network. Can the tidb-controller-manager access the TiKV pods? Are the PD and TiKV pods able to communicate with each other?

| username: TiDBer_6Lizki07 | Original post link

The network is connected, there are logs in TiKV, and it has already connected to PD.

| username: xfworld | Original post link

So that’s the problem then.

| username: TiDBer_6Lizki07 | Original post link

“TiKV of Cluster frs-dev/advanced-tidb not bootstrapped yet”

Well, I just don’t know why it thinks TiKV hasn’t completed initialization, or what tasks are involved in this initialization? The main issue is that there are no error logs in the TiKV pod, only heartbeat logs every 10 minutes.

| username: xfworld | Original post link

Check this advanced-tidb, this should be the instance name of tidb, right?

| username: TiDBer_6Lizki07 | Original post link

This is the cluster name, there is no such pod instance.

| username: xfworld | Original post link

Check the logs of all pods, otherwise it’s hard to determine.

The basic requirements for the K8S environment are also quite complex.

| username: TiDBer_6Lizki07 | Original post link

Uh, the created pods are all in the running state, and there are no special error logs when checking the logs. However, the TiDB pod was not created, and its corresponding StatefulSets were also not created.

Here are the details of the cluster:

+ kubectl describe tidbclusters -n frs-dev

Name:         advanced-tidb
Namespace:    frs-dev
Labels:       <none>
Annotations:  <none>
API Version:  pingcap.com/v1alpha1
Kind:         TidbCluster
Metadata:
  Creation Timestamp:  2022-11-01T07:32:45Z
  Generation:          17
  Resource Version:    3871439862
  Self Link:           /apis/pingcap.com/v1alpha1/namespaces/frs-dev/tidbclusters/advanced-tidb
  UID:                 dae58b62-7e57-4ac1-96ae-8c9b1680bb57
Spec:
  Config Update Strategy:  RollingUpdate
  Discovery:
  Enable Dynamic Configuration:  true
  Enable PV Reclaim:             false
  Helper:
    Image:            alpine:3.16.0
  Image Pull Policy:  IfNotPresent
  Node Selector:
    Project:  RECONPLFM
  Pd:
    Base Image:  pingcap/pd
    Config:      lease = 1800

[dashboard]
  internal-proxy = true

    Max Failover Count:           0
    Mount Cluster Client Secret:  false
    Replicas:                     1
    Requests:
      Storage:           10Gi
    Storage Class Name:  cna-reconplfm-dev-nas
  Pv Reclaim Policy:     Retain
  Tidb:
    Base Image:  pingcap/tidb
    Config:      [log]
  [log.file]
    max-backups = 3

[performance]
  tcp-keep-alive = true

    Max Failover Count:  0
    Replicas:            2
    Service:
      Type:              ClusterIP
    Storage Class Name:  cna-reconplfm-dev-nas
  Tikv:
    Base Image:  pingcap/tikv
    Config:      log-level = "info"

    Max Failover Count:           0
    Mount Cluster Client Secret:  false
    Replicas:                     3
    Requests:
      Storage:           50Gi
    Storage Class Name:  cna-reconplfm-dev-nas
  Timezone:              UTC
  Tls Cluster:
  Tolerations:
    Effect:    NoSchedule
    Key:       RECONPLFM
    Operator:  Equal
  Version:     v6.1.0
Status:
  Cluster ID:  7160932248881483001
  Conditions:
    Last Transition Time:  2022-11-01T07:32:45Z
    Last Update Time:      2022-11-01T07:33:07Z
    Message:               TiKV store(s) are not up
    Reason:                TiKVStoreNotUp
    Status:                False
    Type:                  Ready
  Pd:
    Image:  pingcap/pd:v6.1.0
    Leader:
      Client URL:            http://advanced-tidb-pd-0.advanced-tidb-pd-peer.frs-dev.svc:2379
      Health:                true
      Id:                    11005745135123337789
      Last Transition Time:  2022-11-01T07:33:06Z
      Name:                  advanced-tidb-pd-0
    Members:
      advanced-tidb-pd-0:
        Client URL:            http://advanced-tidb-pd-0.advanced-tidb-pd-peer.frs-dev.svc:2379
        Health:                true
        Id:                    11005745135123337789
        Last Transition Time:  2022-11-01T07:33:06Z
        Name:                  advanced-tidb-pd-0
    Phase:                     Normal
    Stateful Set:
      Collision Count:      0
      Current Replicas:     1
      Current Revision:     advanced-tidb-pd-59466586bc
      Observed Generation:  1
      Ready Replicas:       1
      Replicas:             1
      Update Revision:      advanced-tidb-pd-59466586bc
      Updated Replicas:     1
    Synced:                 true
  Pump:
  Ticdc:
  Tidb:
  Tiflash:
  Tikv:
    Phase:  Normal
    Stateful Set:
      Collision Count:      0
      Current Replicas:     3
      Current Revision:     advanced-tidb-tikv-66f457c77b
      Observed Generation:  3
      Ready Replicas:       3
      Replicas:             3
      Update Revision:      advanced-tidb-tikv-66f457c77b
      Updated Replicas:     3
    Synced:                 true
Events:                     <none>

This is my cluster configuration file, please check if there are any issues:

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: advanced-tidb
  namespace: frs-dev

spec:
  #######################
  # Basic Configuration #
  #######################

  ## TiDB cluster version
  version: "v6.1.0"

  ## Time zone of TiDB cluster Pods
  timezone: UTC

  ## serviceAccount specifies the service account for PD/TiDB/TiKV/TiFlash/Pump/TiCDC components in this TidbCluster
  # serviceAccount: advanced-tidb

  ## ConfigUpdateStrategy determines how the configuration change is applied to the cluster.
  ## Valid values are `InPlace` and `RollingUpdate`
  ##   UpdateStrategy `InPlace` will update the ConfigMap of configuration in-place and an extra rolling update of the
  ##   cluster component is needed to reload the configuration change.
  ##   UpdateStrategy `RollingUpdate` will create a new ConfigMap with the new configuration and rolling update the
  ##   related components to use the new ConfigMap, that is, the new configuration will be applied automatically.
  configUpdateStrategy: RollingUpdate

  ## ImagePullPolicy of TiDB cluster Pods
  ## Ref: https://kubernetes.io/docs/concepts/configuration/overview/#container-images
  # imagePullPolicy: IfNotPresent

  ## If private registry is used, imagePullSecrets may be set
  ## You can also set this in service account
  ## Ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
  # imagePullSecrets:
  # - name: secretName

  ## Image used to do miscellaneous tasks as sidecar container, such as:
  ## - execute sysctls when PodSecurityContext is set for some components, requires `sysctl` installed
  ## - tail slow log for tidb, requires `tail` installed
  ## - fill tiflash config template file based on pod ordinal
  helper:
    image: alpine:3.16.0
  # imagePullPolicy: IfNotPresent

  ## Enable PVC/PV reclaim for orphan PVC/PV left by statefulset scale-in.
  ## When set to `true`, PVC/PV that are not used by any tidb cluster pods will be deleted automatically.
  # enablePVReclaim: false

  ## Persistent volume reclaim policy applied to the PV consumed by the TiDB cluster, default to `Retain`.
  ## Note that the reclaim policy Recycle may not be supported by some storage types, e.g. local.
  ## Ref: https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/
  pvReclaimPolicy: Retain

  ##########################
  # Advanced Configuration #
  ##########################

  ## when deploying a heterogeneous TiDB cluster, you MUST specify the cluster name to join here
  # cluster:
  #   namespace: default
  #   name: tidb-cluster-to-join
  #   clusterDomain: cluster.local

  ## specifying pdAddresses will make PD in this TiDB cluster to join another existing PD cluster
  ## PD will then start with arguments --join= instead of --initial-cluster=
  # pdAddresses:
  #   - http://cluster1-pd-0.cluster1-pd-peer.default.svc:2379
  #   - http://cluster1-pd-1.cluster1-pd-peer.default.svc:2379

  ## Enable mutual TLS connection between TiDB cluster components
  ## Ref: https://docs.pingcap.com/tidb-in-kubernetes/stable/enable-tls-between-components/
  # tlsCluster:
  #   enabled: true

  ## Annotations of TiDB cluster pods, will be merged with component annotation settings.
  # annotations:
  #   node.kubernetes.io/instance-type: some-vm-type
  #   topology.kubernetes.io/region: some-region

  ## NodeSelector of TiDB cluster pods, will be merged with component nodeSelector settings.
  ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  nodeSelector:
    project: RECONPLFM

  ## Tolerations are applied to TiDB cluster pods, allowing (but do not require) pods to be scheduled onto nodes with matching taints.
  ## This cluster-level `tolerations` only takes effect when no component-level `tolerations` are set.
  ## e.g. if `pd.tolerations` is not empty, `tolerations` here will be ignored.
  ## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  tolerations:
    - effect: NoSchedule
      key: RECONPLFM
      operator: Equal
    # value: RECONPLFM

  ## Use the node network namespace, default to false
  ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces
  # hostNetwork: false

  ## specify resource requirements for discovery deployment
  # discovery:
  #   requests:
  #     cpu: 1000m
  #     memory: 256Mi
  #   limits:
  #     cpu: 2000m
  #     memory: 1Gi
  #   ## The following block overwrites TiDB cluster-level configurations in `spec`
  #   imagePullPolicy: IfNotPresent
  #   imagePullSecrets: secretName
  #   hostNetwork: false
  #   priorityClassName: system-cluster-critical
  #   schedulerName: default-scheduler
  #   nodeSelector:
  #     app.kubernetes.io/component: discovery
  #   annotations:
  #     node.kubernetes.io/instance-type: some-vm-type
  #   labels: {}
  #   env:
  #     - name: MY_ENV_1
  #       value: value1
  #   affinity: {}
  #   tolerations:
  #     - effect: NoSchedule
  #       key: dedicated
  #       operator: Equal
  #       value: discovery

  ## if true, this tidb cluster is paused and will not be synced by the controller
  # paused: false

  ## SchedulerName of TiDB cluster pods.
  ## If specified, the pods will be scheduled by the specified scheduler.
  ## Can be overwritten by component settings.
  # schedulerName: default-scheduler

  ## PodManagementPolicy default `OrderedReady` for Pump
  ## and default `Parallel` for the other components.
  ## Ref: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-management-policies
  # podManagementPolicy: Parallel

  ## Affinity for pod scheduling, will be overwritten by each cluster component's specific affinity setting
  ## Can refer to PD/TiDB/TiKV affinity settings, and ensure only cluster-scope general settings here
  ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
  # affinity: {}

  ## Specify pod priorities of pods in TidbCluster, default to empty.
  ## Can be overwritten by component settings.
  ## Ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
  # priorityClassName: system-cluster-critical

  ## If set to `true`, `--advertise-status-addr` will be appended to the startup parameters of TiKV
  enableDynamicConfiguration: true

  ## Set update strategy of StatefulSet, can be overwritten by the setting of each component.
  ## defaults to RollingUpdate
  ## Ref: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies
  # statefulSetUpdateStrategy: RollingUpdate

  ## The identifier of the Pod will be `$(podName).$(serviceName).$(namespace).svc.$(clusterDomain)` when `clusterDomain` is set.
  ## Set this in the case where a TiDB cluster is deployed across multiple Kubernetes clusters. default to empty.
  # clusterDomain: cluster.local

  ## TopologySpreadConstraints for pod scheduling, will be overwritten by each cluster component's specific spread constraints setting
  ## Can refer to PD/TiDB/TiKV/TiCDC/TiFlash/Pump topologySpreadConstraints settings, and ensure only cluster-scope general settings here
  ## Ref: pkg/apis/pingcap/v1alpha1/types.go#TopologySpreadConstraint
  # topologySpreadConstraints:
  # - topologyKey: topology.kubernetes.io/zone

  ###########################
  # TiDB Cluster Components #
  ###########################

  pd:
    ##########################
    # Basic PD Configuration #
    ##########################

    ## Base image of the component
    baseImage: pingcap/pd

    ## pd-server configuration
    ## Ref: https://docs.pingcap.com/tidb/stable/pd-configuration-file
    config: |
      lease = 1800
      [dashboard]
        internal-proxy = true

    ## The desired replicas
    replicas: 1

    ## max inprogress failover PD pod counts
    maxFailoverCount: 0

    ## describes the compute resource requirements and limits.
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
    #   cpu: 1000m
    #   memory: 1Gi
      storage: 10Gi
    # limits:
    #   cpu: 2000m
    #   memory: 2Gi

    ## defines Kubernetes service for pd-server
    ## Ref: https://kubernetes.io/docs/concepts/services-networking/service/
    # service:
    #   type: ClusterIP
    #   annotations:
    #     foo: bar
    #   portName: client

    #############################
    # Advanced PD Configuration #
    #############################

    ## The following block overwrites TiDB cluster-level configurations in `spec`
    # version: "v6.1.0"
    # imagePullPolicy: IfNotPresent
    # imagePullSecrets:
    # - name: secretName
    # hostNetwork: false
    # serviceAccount: advanced-tidb-pd
    # priorityClassName: system-cluster-critical
    # schedulerName: default-scheduler
    # nodeSelector:
    #   app.kubernetes.io/component: pd
    # annotations:
    #   node.kubernetes.io/instance-type: some-vm-type
    # tolerations:
    #   - effect: NoSchedule
    #     key: dedicated
    #     operator: Equal
    #     value: pd
    # configUpdateStrategy: RollingUpdate
    # statefulSetUpdateStrategy: RollingUpdate

    ## List of environment variables to set in the container
    ## Note that the following env names cannot be used and will be overwritten by TiDB Operator builtin envs
    ##   - NAMESPACE
    ##   - TZ
    ##   - SERVICE_NAME
    ##   - PEER_SERVICE_NAME
    ##   - HEADLESS_SERVICE_NAME
    ##   - SET_NAME
    ##   - HOSTNAME
    ##   - CLUSTER_NAME
    ##   - POD_NAME
    ##   - BINLOG_ENABLED
    ##   - SLOW_LOG_FILE
    ## Ref: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
    # env:
    # - name: MY_ENV_1
    #   value: value1
    # - name: MY_ENV_2
    #   valueFrom:
    #     fieldRef:
    #       fieldPath: status.myEnv2

    ## Custom sidecar containers can be injected into the PD pods,
    ## which can act as a logging/tracing agent or for any other use case
    # additionalContainers:
    # - name: myCustomContainer
    #   image: ubuntu

    ## custom additional volumes in PD pods
    # additionalVolumes:
    # # specify volume types that are supported by Kubernetes, Ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#types-of-persistent-volumes
    # - name: nfs
    #   nfs:
    #     server: 192.168.0.2
    #     path: /nfs

    ## custom additional volume mounts in PD pods
    # additionalVolumeMounts:
    # # this must match `name` in `additionalVolumes`
    # - name: nfs
    #   mountPath: /nfs

    ## Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
    ## Ref: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution
    # terminationGracePeriodSeconds: 30

    ## PodSecurityContext holds pod-level security attributes and common container settings.
    ## Ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
    # podSecurityContext:
    #   sysctls:
    #   - name: net.core.somaxconn
    #     value: "32768"

    ## when TLS cluster feature is enabled, TiDB Operator will automatically mount the cluster client certificates if mountClusterClientSecret is set to true
    ## Defaults to false
    ## Ref: https://docs.pingcap.com/tidb-in-kubernetes/stable/configure-a-tidb-cluster#mountclusterclientsecret
    mountClusterClientSecret: false

    ## The storageClassName of the persistent volume for PD data storage.
    storageClassName: "cna-reconplfm-dev-nas"

    ## defines additional volumes for which PVCs will be created by StatefulSet controller
    # storageVolumes:
    #   # this
| username: xfworld | Original post link

How did you deploy it? I recommend you try using Helm.

I didn’t see any issues with the configuration file.

| username: TiDBer_6Lizki07 | Original post link

I followed the steps in the official documentation and used Helm to deploy TiDB Operator. I used kubectl apply -f ${cluster_name} -n ${namespace} to deploy the TiDB cluster. Are you referring to deploying the TiDB cluster using Helm? Can you provide the link to the instructions?

| username: xfworld | Original post link

You need to set up the k8s environment first, it has to match.