[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
Currently, my TiDB cluster has two CDC nodes, but monitoring shows that the disk of one of the CDC nodes is already full, containing sorter temporary files, as shown in the image below:
The other CDC node is functioning normally. Has anyone encountered a similar issue?
I have two questions: First, what are sorter temporary files, and why are they only present on one node and not the other? Second, is there any way to control these sorter files?
Unified Sorter is a sorting engine feature in TiCDC designed to alleviate memory overflow issues caused by the following scenarios:
If the TiCDC data subscription task is paused or interrupted for an extended period, a large amount of incremental update data accumulates and needs to be synchronized.
When starting a data subscription task from an earlier point in time, a high volume of business writes results in a large accumulation of update data that needs to be synchronized.
It is best to expand the disk.
For the production environment, it is recommended to ensure that the available disk space on each node is greater than the (maximum allowed by the business) checkpoint-ts delay * peak upstream write traffic. Additionally, if a large amount of historical data is expected to be synchronized after the changefeed is created, please ensure that the free capacity of each node is greater than or equal to the data to be synchronized.
CDC captures the TiKV change data of the entire cluster and ensures TSO order when synchronizing to downstream. This requires sorting the potentially out-of-order data from different TiKV nodes into order through CDC. When the data volume is small, this can be done in memory, but when the data volume is large, it needs to be written to disk to complete the sorting function. This is a protective design to avoid OOM. Therefore, the sorter temporary files are, as the name suggests, the file directory for external sorting.
If the server uses mechanical hard drives or other storage devices with latency or throughput bottlenecks, the performance of Unified Sorter will be significantly affected.
Unified Sorter uses data_dir to store temporary files by default. It is recommended to ensure that the free capacity of the hard drive is greater than or equal to 500 GiB. For production environments, it is recommended to ensure that the available disk space on each node is greater than the (maximum allowable) checkpoint-ts delay * peak upstream write traffic. Additionally, if a large amount of historical data is expected to be synchronized after the changefeed is created, ensure that the free capacity on each node is greater than the amount of data to be caught up.
Here I have two more questions: 1. Will these sorter files be automatically deleted after data synchronization is completed? 2. Why are there sorter files only on the first node and not on the second node?
–sort-dir: Specifies the temporary file directory used by the sorting engine. It is not recommended to use this option in cdc cli changefeed create; instead, it is recommended to use this option in the cdc server command to set the temporary file directory. The default value for this configuration item is /tmp/cdc_sort. When Unified Sorter is enabled, if this directory on the server is not writable or has insufficient available space, please manually specify the sort-dir. If the directory corresponding to sort-dir is not writable, the changefeed will automatically stop.