After Enabling TLS, the PD Node Cluster Fails to Start When Scaling Out

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 开启TLS后再扩容PD节点集群起不来

| username: starCrush

[TiDB Usage Environment] Production Environment / Testing
[TiDB Version] v5.4.2
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
Operations:

  1. Scale down PD to one node
  2. Enable TLS: tiup cluster tls tidb-test enable
  3. Scale up PD

After scaling up, it indicated that the PD nodes could not start, and all three PDs were in the DOWN state.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Error node logs (newly scaled):
pd_stderr.log


pd.log

Original PD leader node:

| username: Lucien-卢西恩 | Original post link

Is TLS enabled during the cluster deployment phase, or has TLS been enabled for some time? Are TiKV nodes experiencing similar issues?

| username: starCrush | Original post link

After using it for a while, I tried to enable it because the ETCD port was not authenticated. After enabling it, I also expanded TIKV and did not encounter any issues with the expansion.

| username: starCrush | Original post link

A new cluster was deployed, PD was scaled down, TLS was enabled, and then PD was scaled up again, but it still won’t start.

| username: CuteRay | Original post link

Can the expanded machine access port 2379 of that IP in the logs?

| username: starCrush | Original post link

Telnet is accessible.

| username: caiyfc | Original post link

Is this issue causing the problem? What is the version of tiup? You need to use version 1.11.0 or above.
You can check this for more details:
Column - The Trials and Tribulations of Enabling Encrypted Communication TLS in TiDB Production Cluster - Opening Chapter | TiDB Community

| username: starCrush | Original post link

The version is v1.11, I started it after reading this article.

| username: caiyfc | Original post link

What is the cluster architecture like?

| username: starCrush | Original post link

The cluster architecture is 3tidb3pd9kv, the minimum deployment template. This issue hasn’t been resolved yet, but I have already started the new cluster by enabling the enable_tls parameter in the deployment template during the deployment phase.

| username: caiyfc | Original post link

Is your PD mixed on one IP? I checked, and in my successful example of enabling TLS, the PDs are deployed on different IPs. Could this be related?

| username: starCrush | Original post link

My PD is also on different machines, but it is on the same machine as TiDB. There are three TiDB+PD machines and three TiKV machines.

| username: Min_Chen | Original post link

Hello,

It is recommended to follow the steps in the official documentation: 为 TiDB 组件间通信开启加密传输 | PingCAP 文档中心

The command tiup cluster tls tidb-test enable is still in the experimental stage and is not recommended for production use at this time.