Choosing a Solution for Syncing to Hadoop: How to Choose Between Binlog and TiCDC? A Bit Confused, Seeking Expert Advice

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 同步到hadoop的方案选择,binlog和ticdc如何选择呢?有点小迷惑,请大师们指点

| username: 扬仔_tidb

We currently have several versions of v5.x and now want to synchronize to Tencent Cloud’s Hadoop big data. I would like to ask whether to use binlog mode or TiCDC?
According to the official documentation, many functions of binlog will have conflicts.
At present, our idea is to use TiCDC to synchronize to Kafka, and then use Flink to listen to Kafka and write to Hadoop.
I wonder if any experts who have done similar solutions can provide some guidance.

| username: db_user | Original post link

If you’re using version 5, it’s best to choose CDC, as binlog is no longer maintained. CDC is the mainstream now, but it can also have some issues. For example, during DDL operations, it may cause an increase in memory usage, and during large transactions, the latency might be higher. Sometimes, tasks might affect GC. You should be cautious when using it and can search for related posts on the forum.

| username: 扬仔_tidb | Original post link

Is binlog no longer maintained? I didn’t notice any documentation about it?

| username: db_user | Original post link

I may not have described it accurately, but after version 5, the official team started promoting CDC.

| username: xingzhenxiang | Original post link

The image is not visible. Please provide the text you need translated.