TiCDC Synchronization Error: Too Many Open Files

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticd同步报错,too many open files

| username: 扬仔_tidb

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.3.0
【Reproduction Path】Operations performed that led to the issue
【Encountered Issue: Issue Phenomenon and Impact】
Task stuck, error reported as follows:
“code”: “CDC:ErrUnifiedSorterIOError”,
“message”: “[CDC:ErrUnifiedSorterIOError]unified sorter IO error. Make sure your sort-dir is configured correctly by passing a valid argument or toml file to cdc server, or if you use TiUP, review the settings in tiup cluster edit-config. Details: open /tmp/cdc_data/tmp/sorter/sort-14873-1274288.tmp: too many open files in system”
[tidb@ticdc-2 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 123517
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1000000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 123517
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I checked the /tmp/cdc_data/tmp/sorter directory and found over 80,000 files starting with sort-14873.
Has anyone encountered this issue, and how can it be resolved?

| username: dba远航 | Original post link

Check the current system configuration using ulimit -a. If it’s not sufficient, increase it using ulimit -HSn [number], or directly modify the limits.conf file.

| username: 扬仔_tidb | Original post link

open files (-n) 1000000

| username: 芮芮是产品 | Original post link

Delete the tmp file and resynchronize.

| username: 扬仔_tidb | Original post link

May I ask if deleting these tmp files and resynchronizing will resume synchronization from the stuck point?
What caused this issue?
The synchronization with ticd was fine before, but it got stuck at 4 AM this morning.

| username: 小龙虾爱大龙虾 | Original post link

If you are sure that the open file size is that large before CDC starts, then it is most likely a bug. I searched and found similar bugs that have been resolved in higher versions:

Given that your TiDB version is relatively low, it is recommended that you upgrade to the latest LTS version as soon as possible.

| username: 扬仔_tidb | Original post link

This is the log at that time. The last synchronization time of the data in the table was stuck at 04:08. However, looking at the cdc.log, there were no important error prompts in the log at that time.

| username: xfworld | Original post link

Check if there are any DDL changes upstream or downstream, as this can cause a stall…

If this is the case, it is recommended to handle it manually and restart the synchronization.

| username: Kongdom | Original post link

It looks like it has been fixed in version v5.4.1, you can upgrade and verify it.

This problem has been solved by dbsorter, which is enabled by default from 6.0 onward. It is also available on 5.4.1, which satisfies the needs of the cloud. So I’m closing this issue.

In the second issue, it appears to have been fixed in v5.3, but it’s not clear which minor version.

ti-chi-bot merged commit 865d357 into pingcap:release-5.3 on Nov 10, 2021

| username: Fly-bird | Original post link

Is it possible that the “too many open files” limit in the system is restricting it? Try setting it to 200,000 and check with ulimit -a.

| username: zhaokede | Original post link

Modify the system configuration file to adjust the number of open files; or upgrade to the new version of TiDB.

| username: 随缘天空 | Original post link

It should be a bug issue because the parameter is already set to 65536.

| username: oceanzhang | Original post link

Too many file handles.

| username: oceanzhang | Original post link

Can it really exceed 65536? Is it actually using that much, or is it a bug?

| username: oceanzhang | Original post link

For installation and deployment, tiup should automatically set these parameters.

| username: 随缘天空 | Original post link

It didn’t use that much. It was set during system parameter optimization during installation, and this parameter is already sufficient.

| username: 路在何chu | Original post link

Do you have that many tables in your database?

| username: oceanzhang | Original post link

Have you identified the issue so far?

| username: oceanzhang | Original post link

I feel that TiDB is highly regarded by Rui Ran, but there are indeed many bugs.

| username: 随缘天空 | Original post link

It wasn’t me who asked, you can ask the blogger.