Failed to Start TiFlash Deployment

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 部署tiflash启动失败

| username: atidat

【TiDB Usage Environment】Production Environment or Test Environment or POC
【TiDB Version】V6.2.0
【Encountered Problem】TiFlash failed to start
【Reproduction Path】What operations were performed to encounter the problem
After deploying the TiDB cluster through tidb-operator, deploy TiFlash separately
【Problem Phenomenon and Impact】
TiFlash failed to start.

【Attachments】Related logs and monitoring (https://metricstool.pingcap.com/)

[2022/08/23 11:16:51.617 +00:00] [ERROR] [<unknown>] ["Application:DB::Exception: The configuration \"storage.raft.dir\" should be an array of strings. Please check your configuration file."] [thread_id=1]

Corresponding source code

Call point
if (auto kvstore_paths = get_checked_qualified_array(table, "raft.dir"); kvstore_paths)
        kvstore_data_path = *kvstore_paths;

Error point
auto get_checked_qualified_array = [log](const std::shared_ptr<cpptoml::table> table, const char * key) -> cpptoml::option<Strings> {
        auto throw_invalid_value = [log, key]() {
            String error_msg = fmt::format("The configuration \"storage.{}\" should be an array of strings. Please check your configuration file.", key);
            LOG_FMT_ERROR(log, "{}", error_msg);
            throw Exception(error_msg, ErrorCodes::INVALID_CONFIG_PARAMETER);
        };
        // not exist key
        if (!table->contains_qualified(key))
            return cpptoml::option<Strings>();

        // key exist, but not array
        auto qualified_ptr = table->get_qualified(key);
        if (!qualified_ptr->is_array())
        {
            throw_invalid_value();
        }
        // key exist, but can not convert to string array, maybe it is an int array
        auto string_array = table->get_qualified_array_of<String>(key);
        if (!string_array)
        {
            throw_invalid_value();
        }
        return string_array;
    };

If the question is about performance optimization or troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output.

| username: atidat | Original post link

According to the deployment reference here, it does not involve raft.dir related configuration.

| username: ShawnYan | Original post link

In the TiFlash configuration file, is this configuration item not written?

| username: 数据小黑 | Original post link

Reference: 专栏 - tiflash 6.0 on K8s 扩容与新特性实践 | TiDB 社区
Execute:

kubectl edit tc basic -n TiDB-cluster

Modify the TiFlash configuration as follows:

  TiFlash:
    baseImage: uhub.service.ucloud.cn/pingcap/TiFlash
    config:
      config: |
        [storage]
          [storage.main]
            dir = ["/data0/db"]
          [storage.raft]
            dir = ["/data0/kvstore"]
    maxFailoverCount: 3
    replicas: 3
    storageClaims:
    - resources:
        requests:
          storage: 10Gi

Add config, noting:

  1. Double nested config
  2. If there are configurations under storage, they all need to be specified in the configuration. In practice, the storage.main configuration is often missed, so be sure to check.

The configuration has some issues, just make some adjustments and it will be fine.

| username: atidat | Original post link

With this configuration, TiDB, TiKV, and PD cannot start.
FYI: Deployed through tidb-cluster.yaml

| username: 数据小黑 | Original post link

Was it modified by executing the above command, or by modifying tidb-cluster.yaml and then redeploying? The command format is kubectl edit tc ${cluster_name} -n ${namespace}

| username: atidat | Original post link

Based on the modifications to tidb-cluster.yaml.

After making these changes, I can observe that the new configuration has been synchronized when I edit tc.

| username: 数据小黑 | Original post link

What is the error now? I feel it might be a formatting issue. Do you mind using kubectl edit tc ${cluster_name} -n ${namespace} to export a configuration? I will check it with a formatting tool.

| username: atidat | Original post link

Sure

tc.yaml (7.4 KB)

| username: 数据小黑 | Original post link

There’s no issue with the format. What error are you encountering now?

| username: atidat | Original post link

A single operator can theoretically connect to multiple TiDB clusters, right?

I didn’t encounter this issue when deploying tidb-admin and tidb-cluster from scratch in a brand new k8s environment.

| username: atidat | Original post link

There are no errors, but there are no actions to create deployment or statefulset resources. The kubectl get command returns empty results.