Observability of a transaction’s replication status in TiCDC

Fredrik_Bromee · June 21, 2023, 1:46pm

Hi!

This is a general design question about using TiCDC as a method for disaster recovery by replicating state to one or more downstream TiDB clusters.

I would like to commit a transaction in a local TiDB and use TiCDC to replicate to downstream TiDB(s) in a different region or regions, that is roughly 100ms away (round-trip latency). I would like to be able to know when the transaction has been stored also in the downstream TiDB. Is this possible? I then want to perform some action in the original region.

For example, can I subscribe to the replication status, getting notifications with latest replicated transaction ids?

The reason for me doing it in this way is that I need to work within an existing architecture framework with application logic that I cannot change. The framework does lots of small local commits and replicates transactions in batches to get high throughput. I want to replace the persistence layer with TiDb.

Thanks in advance,
Fredrik Bromee

TiDB-AI-Assistant · June 21, 2023, 1:46pm

Hi Fredrik,

Thank you for your question. Yes, it is possible to observe the replication status of a transaction in TiCDC. TiCDC provides a monitoring API that allows you to monitor the replication status of a changefeed. You can use this API to get the latest replicated transaction IDs and other information about the replication status.

To use the monitoring API, you need to enable the Prometheus exporter in TiCDC and configure Prometheus to scrape the metrics. Once you have done this, you can use a monitoring tool like Grafana to visualize the metrics and set up alerts based on the replication status.

In your case, you can set up an alert that triggers when the latest replicated transaction ID in the downstream TiDB(s) matches the transaction ID of the transaction you committed in the local TiDB. This will indicate that the transaction has been successfully replicated to the downstream TiDB(s).

I hope this helps. Let me know if you have any further questions.

winkyao · June 22, 2023, 7:14am

Hi, Fredrik, It sounds like you need a more detailed solution. I am not entirely familiar with the mechanism of CDC. If your issue remains unresolved, please feel free to leave a message.

Fredrik_Bromee · June 22, 2023, 7:51am

Hi Wink. The robo reply is a good start! I think I’d need something a little more robust than going via a Grafana alert though

I tried searching for the monitoring API to see if I could poll it directly but I have not found it yet. I’d appreciate any pointers.

Maybe I could query / subscribe to status of the downstream TiDB cluster? Are transaction ids preserved across CDC, and are they monotonically increasing? IF they are, I could ask if the downstream cluster has committed a transaction equal or later than the one I’m waiting for. This would also assume that transaction order is preserved across CDC.

At this point, I’m investigating feasibility so I don’t need to have an exact solution - more important for me is to find out if it would be possible or not.

Thanks,
Fredrik

winkyao · June 22, 2023, 8:14am

No, the APi is not in monitoring system(Grafana or prometheus), it is a open protocol API in TiCDC: TiCDC Open Protocol | PingCAP Docs

transaction ids in here called TS which is a logic timestamps that represent transaction id in TiDB, it is preserved across CDC and all TiDB components. It is unique for representing a transaction. I do not recommend directly comparing ts (timestamp) or transaction ids, as this is an internal logic of TiDB.

BTW, your goal is a advanced demand I mean, if necessary I can find PingCAP engineer to help you find the solution in detail. Technically speaking, it is at least feasible, but if you subscribe to CDC and finish the logic by yourself, I am not sure if there are any risks though. And I am not a CDC expert, so I cannot provide you better advice on this.