Issues with Using Ti-CDC for Synchronization

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于使用Ti-CDC同步的问题

| username: 泰迪比爱好者

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
Currently, my TiDB cluster has two CDC nodes, but monitoring shows that the disk of one of the CDC nodes is already full, containing sorter temporary files, as shown in the image below:
image
The other CDC node is functioning normally. Has anyone encountered a similar issue?

| username: Fly-bird | Original post link

You have a lot of CDC tasks running.

| username: zxgaa | Original post link

Isn’t the provided disk too small? You can control the read speed to reduce disk usage.

| username: zxgaa | Original post link

You can check the task list to see if there are any errors.

| username: 泰迪比爱好者 | Original post link

There are no errors, and it is still synchronizing normally. I just feel that a full disk is always a potential risk.

| username: 泰迪比爱好者 | Original post link

I have two questions: First, what are sorter temporary files, and why are they only present on one node and not the other? Second, is there any way to control these sorter files?

| username: 泰迪比爱好者 | Original post link

It’s okay, there isn’t much data.

| username: 像风一样的男子 | Original post link

Unified Sorter is a sorting engine feature in TiCDC designed to alleviate memory overflow issues caused by the following scenarios:

  • If the TiCDC data subscription task is paused or interrupted for an extended period, a large amount of incremental update data accumulates and needs to be synchronized.
  • When starting a data subscription task from an earlier point in time, a high volume of business writes results in a large accumulation of update data that needs to be synchronized.
| username: 像风一样的男子 | Original post link

It is best to expand the disk.
For the production environment, it is recommended to ensure that the available disk space on each node is greater than the (maximum allowed by the business) checkpoint-ts delay * peak upstream write traffic. Additionally, if a large amount of historical data is expected to be synchronized after the changefeed is created, please ensure that the free capacity of each node is greater than or equal to the data to be synchronized.

| username: Jellybean | Original post link

CDC captures the TiKV change data of the entire cluster and ensures TSO order when synchronizing to downstream. This requires sorting the potentially out-of-order data from different TiKV nodes into order through CDC. When the data volume is small, this can be done in memory, but when the data volume is large, it needs to be written to disk to complete the sorting function. This is a protective design to avoid OOM. Therefore, the sorter temporary files are, as the name suggests, the file directory for external sorting.

  • If the server uses mechanical hard drives or other storage devices with latency or throughput bottlenecks, the performance of Unified Sorter will be significantly affected.
  • Unified Sorter uses data_dir to store temporary files by default. It is recommended to ensure that the free capacity of the hard drive is greater than or equal to 500 GiB. For production environments, it is recommended to ensure that the available disk space on each node is greater than the (maximum allowable) checkpoint-ts delay * peak upstream write traffic. Additionally, if a large amount of historical data is expected to be synchronized after the changefeed is created, ensure that the free capacity on each node is greater than the amount of data to be caught up.
| username: 泰迪比爱好者 | Original post link

Here I have two more questions: 1. Will these sorter files be automatically deleted after data synchronization is completed? 2. Why are there sorter files only on the first node and not on the second node?

| username: 像风一样的男子 | Original post link

The sorter files will be deleted once the data is synchronized downstream. Is the CDC node task with the accumulated sorter files stuck?

| username: 泰迪比爱好者 | Original post link

There is no freeze, I see CDC is still outputting logs.

| username: 像风一样的男子 | Original post link

Can you check if the TSO of the CDC task has advanced?

| username: 泰迪比爱好者 | Original post link

No, but the data is synchronized.

| username: 泰迪比爱好者 | Original post link

I suspect that my gc-ttl setting is too long, which caused some subsequent issues.

| username: TiDBer_小阿飞 | Original post link

–sort-dir: Specifies the temporary file directory used by the sorting engine. It is not recommended to use this option in cdc cli changefeed create; instead, it is recommended to use this option in the cdc server command to set the temporary file directory. The default value for this configuration item is /tmp/cdc_sort. When Unified Sorter is enabled, if this directory on the server is not writable or has insufficient available space, please manually specify the sort-dir. If the directory corresponding to sort-dir is not writable, the changefeed will automatically stop.

| username: 泰迪比爱好者 | Original post link

I am using tiup to manage cdc, and I found that the cdc on that node stopped because the disk was full.

| username: 泰迪比爱好者 | Original post link

I just took a closer look, and it indeed got stuck.