In the case of limited machines, with whom should TiCDC be co-located, TiDB or TiKV?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 机器有限情况下 TiCDC 更应该和谁混部,tidb or tipd ?

| username: Jellybean

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] Any version
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Issue Phenomenon and Impact]
As the title suggests, due to recent business adjustments, the data high-availability architecture requires deploying a TiDB master-slave cluster, using TiCDC for real-time synchronization. However, applying for a separate TiCDC machine for a single cluster is costly.

The business characteristics are that TP traffic accounts for 80%+ and 20% low to medium intensity AP traffic, with the overall cluster write QPS in the tens of thousands.

Currently, the plan is to achieve this goal through mixed deployment of TiCDC and other components. What are the best practices from the community experts? With whom should TiCDC be mixed, tidb or tipd?

| username: 啦啦啦啦啦 | Original post link

We have mixed the non-leader nodes of PD with other services, fearing it might affect the business. The non-leader nodes of PD do not provide services anyway. If it is a leader node, the impact might be greater if PD encounters issues. Therefore, deploying it to a TiDB node might be better, as it would only affect that single TiDB node at most.

| username: 小龙虾爱大龙虾 | Original post link

It is recommended to use NUMA for simple isolation in hybrid deployments to prevent excessive impact on other components.

| username: Kongdom | Original post link

:thinking: I think TiDB is better, after all, TiDB is stateless, but PD is the brain.

| username: TIDB-Learner | Original post link

TiCDC is relatively resource-intensive. Generally, PD configurations are low, and when deployed together with PD, the brain role is more prone to crashes. Personally, I feel that regardless of who it is mixed with, it still depends on their resource availability. Resource isolation operations are necessary.

| username: 路在何chu | Original post link

CDC consumes a lot of memory, so it’s relatively safer to place it together with PD.

| username: xfworld | Original post link

I don’t recommend mixed deployment, don’t make things difficult for yourself :stuck_out_tongue_closed_eyes:

| username: 啦啦啦啦啦 | Original post link

If it’s a physical machine, not mixing components is indeed a bit wasteful. Our 3 PD configurations are also 112C, 512G, 3T NVME SSD. It’s too wasteful to dedicate one to a non-leader PD…

| username: Jellybean | Original post link

I also thought of this point and planned to deploy with the non-PD leader. However, in actual practice, there is a situation where the initial deployment is indeed on the non-leader node, but the PD leader might drift. If one day it drifts to the machine with TiCDC, how should this situation be handled? Should we manually intervene to transfer the leader away?

| username: Jellybean | Original post link

Everyone knows that mixed deployment should be avoided and that there are risks, but we are still helpless.

On one hand, there are business requirements.
On the other hand, there are ample resources left: 512GB of memory and 112 CPUs dedicated to a non-Leader PD, which is too wasteful…

| username: 啦啦啦啦啦 | Original post link

Yes, it can be manually switched. I specifically asked the official before, and unless it is some extreme situation, the probability of normal PD switching occurring is still very small.

| username: xfworld | Original post link

Split it up, whether through virtualization or cgroup isolation, this will be better.

However, IO isolation will be more difficult, and when conflicts occur, it will be more passive.

| username: Jellybean | Original post link

The issue with IO can be addressed by using different SSDs for PD and CDC, which can isolate IO resources.

It’s difficult to request an increase in the number of machines, but it’s relatively easier to request additional hard drives.

| username: heiwandou | Original post link

PD looks at the load, generally PD nodes do not require many resources.

| username: xfworld | Original post link

Then just use one machine, allocate enough CPU and memory, separate disk I/O, and isolate using NUMA and cgroups.

| username: 有猫万事足 | Original post link

You can refer to this document to set the priority for PD. Set the commonly used PD to the highest priority, the backup to the second highest priority, and the one that mostly handles voting to the lowest priority.

In this case, even if the highest priority PD1 goes down, as long as PD1 comes back up, the leader will automatically switch back to PD1.

Of course, there are drawbacks. If PD1 restarts quickly, it means switching the PD leader twice in a short period, causing unnecessary jitter. Additionally, with priority settings, manually switching the PD leader will quickly revert back, which might seem odd if not noticed.

You can weigh the pros and cons to decide whether to do this.

| username: 小糊涂萌新 | Original post link

Mixing TiCDC with the TiDB cluster might be the most appropriate way to ensure real-time synchronization between the TiDB primary and secondary servers.

| username: Jellybean | Original post link

Adjusting the PD leader priority does have obvious drawbacks, as it can easily cause secondary impacts on the cluster. However, if there are 3 PD machines and only 2 TiCDC are deployed, you can mix TiCDC with the two non-leader PDs, and let the other PD have a dedicated machine. In this case, you can set its priority to the highest.

So, this is also a good idea. Thanks for sharing.

| username: Jellybean | Original post link

Brother, what are you trying to express? I can’t understand the meaning of your statements.

| username: 像风一样的男子 | Original post link

How are the resources of the replica? Generally, the pressure on the replica is relatively small. You can try to mix CDC with the replica server. My CDC is deployed together with the TiDB node of the replica.