How to Synchronize TiDB Data to Elasticsearch Using TiCDC

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 如何 同步tidb的数据 到elastic search

| username: TiDBer_cVnQnF8a

【TiDB Usage Environment】Production Environment
【TiDB Version】6.1.0
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Issue Phenomenon and Impact】How does ticdc synchronize TiDB data to Elasticsearch? The official documentation only mentions synchronization with MySQL and Kafka. Is direct synchronization with Elasticsearch supported?

| username: Billmay表妹 | Original post link

A common method is to use TiCDC to synchronize TiDB’s incremental data to Kafka, and then use Kafka Connect’s Elasticsearch plugin to write the data from Kafka to Elasticsearch. This approach leverages Kafka Connect’s plugin ecosystem to stream data from Kafka to Elasticsearch.

Here is a simple step-by-step example:

  1. First, use TiCDC to synchronize TiDB’s incremental data to Kafka. You can follow the steps in the official documentation to configure and start TiCDC, writing data to Kafka topics.
  2. Then, use Kafka Connect’s Elasticsearch plugin to write data from Kafka to Elasticsearch. You need to install and configure Kafka Connect and add the Elasticsearch plugin to Kafka Connect’s plugin directory. Next, create a Kafka Connect configuration file, specifying the Kafka topic as the data source and writing the data to Elasticsearch.
  3. Start Kafka Connect and use the above configuration file to start the Elasticsearch connector. The connector will read data from the Kafka topic and write it to Elasticsearch.

By using this method, you can transfer TiDB’s incremental data to Elasticsearch via TiCDC and Kafka Connect, achieving data synchronization.

| username: TiDBer_cVnQnF8a | Original post link

Thank you for the reply. I do know about this method. I want to know if there is a direct way to connect to Elasticsearch?

| username: 像风一样的男子 | Original post link

There is currently no direct synchronization method. You can only take an indirect approach: ticdc → kafka → es.

| username: tidb菜鸟一只 | Original post link

You can try Canal Cloud.

| username: Fly-bird | Original post link

The data format is different, so it cannot be directly sent to ES.

| username: Hacker007 | Original post link

Currently, there isn’t one. You can only consume binlog and write through code.

| username: zxgaa | Original post link

You can use Kafka and consume it to ES with Logstash.

| username: ShawnYan | Original post link

If I read it out with Java and then write it in, does it count as direct pass-through?

| username: ShawnYan | Original post link

This idea is interesting, it has truly inherited the essence of ELK.

| username: dba远航 | Original post link

There needs to be an intermediate conversion process.

| username: ShawnYan | Original post link

This method is the most stable, using kafka-connect-elasticsearch.

| username: andone | Original post link

You can use Kafka for synchronization.