What is the relationship between PD server and etcd?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd server 和etcd之间什么关系?

| username: TIDB-Learner

What is the relationship between the PD server and etcd? What additional features or optimizations does PD have?

For example, etcd also defaults to port 2379, so it’s definitely not just a port number conflict. Of course, we can’t simply think of the PD server module as a secondary development of etcd.

Looking forward to experts using simple and easy-to-understand language to describe the differences and connections between the two.

| username: 托马斯滑板鞋 | Original post link

Wrapped etcd with an outer layer = pd :upside_down_face:

| username: 像风一样的男子 | Original post link

The original explanation is that PD supports fault-tolerance by embedding etcd. Since etcd is embedded in PD, the ports are the same.

| username: TiDBer_小阿飞 | Original post link

PD integrates etcd to automatically support auto failover, eliminating concerns about single points of failure. At the same time, PD ensures strong data consistency through etcd’s raft, so there’s no need to worry about data loss.

In etcd, it is default to listen on ports 2379 and 2380. Port 2379 is mainly used by etcd to handle external requests, while port 2380 is used for communication between etcd peers.

| username: H810089 | Original post link

I believe these two are inherently not comparable. PD is the “brain” of the TiDB database, responsible for scheduling within the TiDB database. However, PD itself has some data that needs to be persisted, so it requires another database to persist this data. Since Etcd is a distributed KV database and is written in Golang, it is easy to integrate into PD, which is why Etcd was ultimately chosen.

Therefore, the relationship between PD and Etcd is similar to the relationship between TiKV and RocksDB.

| username: TiDBer_H5NdJb5Q | Original post link

My understanding is that it’s mainly because PD has embedded etcd. Actually, it can also be separated, but the deployment would be more complicated. It’s similar to the relationship between Hadoop’s NameNode and ZooKeeper. You could say that NameNode uses ZooKeeper’s functionality, but you can’t say it’s just a layer on top of ZooKeeper.

| username: 小龙虾爱大龙虾 | Original post link

PD has embedded etcd, and the additional functionalities are those of PD, such as scheduling, TSO generation, and Region metadata storage.

| username: zhh_912 | Original post link

The same, one contains one.

| username: 小于同学 | Original post link

PD integrates etcd to automatically support auto failover.

| username: 随缘天空 | Original post link

The PD server integrates etcd, which implements the Raft consensus algorithm to ensure high availability and data consistency of the service. Additionally, leveraging etcd’s fault detection and recovery mechanisms, PD can automatically handle node failures and perform corresponding reconfiguration and data migration, thereby achieving automatic failover.

| username: shigp_TIDBER | Original post link

The original explanation is that PD supports fault-tolerance by embedding etcd. Since etcd is embedded in PD, the ports are the same.

| username: 大飞哥online | Original post link

PD uses etcd as the storage backend to save the cluster’s metadata and scheduling information. PD stores data such as the cluster’s topology, Region distribution information, and scheduling policies in etcd to ensure data consistency and persistence.

| username: 大飞哥online | Original post link

What new features or optimizations does PD have?

  1. Cluster Metadata Management: PD is responsible for managing the topology of the TiDB cluster, the distribution information of Regions, the scheduling of replicas, and other metadata to ensure the stable operation of the cluster.
  2. Scheduling Policies: PD implements various scheduling policies, such as Leader scheduling, Region scheduling, and replica scheduling, to achieve load balancing and fault recovery.
  3. Automated Operations and Maintenance: PD supports automated O&M functions, such as automatic failover and automatic replica scheduling, reducing the workload of O&M personnel.
  4. Elastic Scaling: PD supports elastic scaling of the cluster, allowing dynamic adjustment of the cluster’s size and configuration to meet different workload requirements.
  5. Fault Recovery: PD can quickly detect and handle faults in the cluster, such as node failures and abnormal Region states, ensuring high availability and stability of the cluster.
| username: TiDBer_RjzUpGDL | Original post link

PD integrates etcd.

| username: 健康的腰间盘 | Original post link

I understand it as an inclusion relationship.

| username: xingzhenxiang | Original post link

I understand, just to take a reference.