Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiCDC canal-json 协议下 WATERMARK Event 消息是什么?
What is a WATERMARK Event? Can someone show what a WATERMARK Event looks like? Also, what is the business significance of this WATERMARK Event?
The Canal-JSON protocol was originally designed for MySQL and does not include important fields such as TiDB’s unique CommitTS transaction identifier. To address this issue, TiCDC attaches TiDB extension fields in the Canal-JSON protocol format. When enable-tidb-extension
is set to true
in the sink-uri
, TiCDC’s behavior when generating Canal-JSON messages is as follows:
- DML Event and DDL Event messages sent by TiCDC will contain a field named
_tidb
.
- TiCDC will send WATERMARK Event messages.
I consumed it, but I don’t know what business value this has. Waiting for an expert to appear.
{
"id": 0,
"database": "",
"table": "",
"pkNames": null,
"isDdl": false,
"type": "TIDB_WATERMARK",
"es": 1659085352032,
"ts": 1659085354502,
"sql": "",
"sqlType": null,
"mysqlType": null,
"data": null,
"old": null,
"_tidb": {
"watermarkTs": 434919270523076621
}
}
Hello, the Watermark TS field provided by the Watermark Event is mainly used in stream processing to enhance the real-time synchronization of data. There is a blog about Flink supporting Watermark Event that you can study: Flink最佳实践 - Watermark原理及实践问题解析_event time skew-CSDN博客
Additionally, regarding implementation, ticdc also has a document that can help you understand: tiflow/docs/distributed-scheduling.md at 7e5026f844fbcaacd56df764591cb489ce53ecfb · pingcap/tiflow · GitHub
Watermark is used to handle anomalies such as message duplication and out-of-order messages. See the answer at TiCDC事件的表级排序的问题 - #5,来自 neilshen - TiDB 的问答社区.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.