Can Haproxy+Keepalive+TiDB be used to build high availability in a production environment?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 生产环境中用Haproxy+Keepalive + TiDB构建高可用吗?

| username: abelard2008

[TiDB Usage Environment] Production
[TiDB Version] 6.5.0
[Reproduction Path] None
[Encountered Problem:
After testing a 3-node TiDB cluster for a period of time, in order to avoid modifying the original MySQL-based client code, we followed the Haproxy+keepalive method for TiDB high availability practice and tested the high availability solution of Haproxy+keepalive+TiDB as described in the article.
I would like to confirm, is applying Haproxy+keepalive+TiDB to the production environment the best high availability solution for TiDB?
In a production environment, if using this solution, what should be particularly noted?

 Thank you!
| username: ffeenn | Original post link

Given limited resources, this is the best option. If resources are abundant, it’s better to use F5 type hardware or cloud providers’ ALB, SLB, etc. Choose different fault redundancy solutions based on the importance of your business.

| username: Kongdom | Original post link

Coincidentally, we are using this model as well and haven’t encountered any issues so far.

| username: abelard2008 | Original post link

Thank you!
Could you please share how many nodes you have used to deploy these three systems in your production environment?

| username: 考试没答案 | Original post link

I have a 3-node setup. TiDB hasn’t used any load balancing yet. It’s been idle for a long time, and I’m planning to use SLB for load balancing.

| username: Kongdom | Original post link

We all use standard deployment, with 3 nodes for each component.

| username: abelard2008 | Original post link

It seems to have gone off track. My goal is to achieve high availability because the MySQL client, without modification, can only connect to a specific TiDB node. If this node fails, the application crashes. Using Haproxy + keepalive allows me to connect only to its virtual IP, and they help me with failover, which of course also achieves load balancing.

| username: abelard2008 | Original post link

Are Haproxy and keepalive deployed on three servers independent of the TiDB cluster?

| username: tidb菜鸟一只 | Original post link

Haproxy and keepalive are deployed on servers independent of the TiDB cluster, with just two points for primary and backup.

| username: 会飞的土拨鼠 | Original post link

You can set up 3 nodes for each component, and have Prometheus monitoring on a separate device.

| username: Kongdom | Original post link

No, our resources are quite limited, so they are together with the nodes.

| username: 裤衩儿飞上天 | Original post link

If you have money, use F5 and SLB; if not, HAProxy + Keepalive with two primary and backup nodes is sufficient.

| username: buptzhoutian | Original post link

Provide an idea

On the application side, deploy a ProxySQL locally (very lightweight).

The application accesses the tidb-server through the local ProxySQL, for example, 127.0.0.1 or socket. The advantage is that the network path is relatively short.

There are quite a few instances of ProxySQL, but you can use configuration management tools (such as ansible) to manage the configurations of these ProxySQL.

For practices related to HAProxy and Keepalived, you can also refer to this :point_right: blog

| username: zhouzeru | Original post link

We separately set up two machines for Haproxy+keepalive.

| username: 大鱼海棠 | Original post link

There are pros and cons. Using keepalive to load balance HAProxy will also increase the operational burden and require the network to manage RPC ports uniformly. If the VIP does not switch, it will also affect the business. You can consider using HAProxy directly and removing keepalive.

| username: xingzhenxiang | Original post link

I am using a single HAProxy. When the TiDB server’s memory is too high and it reloads, the frontend can still reconnect.

| username: 啦啦啦啦啦 | Original post link

A single HAProxy would make HAProxy itself a single point of failure, right? We also use HAProxy with Keepalived.

| username: TI表弟 | Original post link

I suggest using layer 4, directly using lb or lvs for connection layer load balancing and high availability. It currently looks very useful and stable. The haproxy solution at the application layer will bring greater network consumption.