Will a CDC task continue running after being stopped for a period of time and then restarted?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: cdc 任务停止一段时间,重启后还会继续运行嘛

| username: Raymond

I would like to ask, if the CDC task stops for a few days, can it continue to run after restarting?
CDC reads the raft log for data synchronization, and the raft log has a retention mechanism. So if the CDC task stops for a few days, is it possible that the raft log has been deleted, and does that mean the CDC cannot continue running after restarting?

| username: Raymond | Original post link

The answer is incorrect, this should be the gc behavior of ticdc, not the raft log.

| username: 裤衩儿飞上天 | Original post link

Learn about gc-ttl

| username: Raymond | Original post link

Sure, teacher. I would like to ask, if ticdc synchronizes data, it should be written to disk, right? But I didn’t see the data directory in the /tidb-data/cdc-8300 directory. Where is the data stored by cdc?

| username: 裤衩儿飞上天 | Original post link

In the data_dir you configured

| username: Raymond | Original post link

My datadir is set to /tidb-data/cdc-8300 by default, but I only see a tmp directory inside /tidb-data/cdc-8300.

| username: 裤衩儿飞上天 | Original post link

Has the changefeed started normally?

| username: Raymond | Original post link

It has started, and the status is normal. Does this mean that ticdc’s data is not stored, but directly converted into SQL and sent to the downstream through the network?

| username: Raymond | Original post link

I don’t see any CDC data written to disk on all CDC nodes upstream and downstream.

| username: 裤衩儿飞上天 | Original post link

Sorting defaults to prioritizing memory. When it’s insufficient, it will spill to disk. Once the data is consumed, it will no longer be stored.

| username: Raymond | Original post link

I think so too, it may not need to fall back to disk.

I also want to ask a question. Assuming the TiDB system variable tidb_gc_life_time is 2h and TiCDC’s gc-ttl is 24h, will TiCDC’s gc-ttl block TiDB’s normal GC behavior (changing TiDB’s GC to 24h)?

| username: 裤衩儿飞上天 | Original post link

After the task stops, it will retain 24. You can specifically look at the description I gave you about gc-ttl.

| username: Raymond | Original post link

I tested it myself. After I manually paused the CDC task, the gc-ttl of CDC is 24 hours, and my gc_life_time is 2 hours. However, I see from the monitoring and logs that the GC of TiKV is continuously advancing. What’s going on?

| username: 裤衩儿飞上天 | Original post link

Could you please share your testing process?
Including configuration, detailed commands, and output results.

| username: Raymond | Original post link

Hello, I later tested and found that if ticdc still has unsynchronized data, stopping cdc will block tikv gc. Thank you for your reply, and I wish you a Happy New Year.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.