TiCDC Anomaly

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 异常

| username: 表渣渣渣

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.1
[Reproduction Path]
Last night, I created a table, added fields to the table, wrote data, created views, and performed other operations; then the CDC frequently encountered exceptions.

Create table: (For safety, I changed the fields, table name, and comments, other parts remain unchanged)

CREATE TABLE dim.`1111_vest` (
  `id` int(11) NOT NULL COMMENT 'Primary Key',
  `1111_id` int(11) DEFAULT NULL COMMENT 'id',
  `1111_name` varchar(48) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '',
  `1111_name` varchar(100) DEFAULT NULL COMMENT '',
  `qy_wx_id` varchar(48) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '',
  `1111_phone` varchar(24) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '',
  `1111_dep` varchar(100) DEFAULT NULL COMMENT '',
  `owner_name` varchar(100) DEFAULT NULL COMMENT '',
  `owner_job_id` varchar(100) DEFAULT NULL COMMENT '',
  `owner_dep` varchar(100) DEFAULT NULL COMMENT '',
  `consult_id` int(11) DEFAULT NULL COMMENT '',
  `owner_222_id` int(11) DEFAULT NULL COMMENT '',
  `owner_333_id` int(11) DEFAULT NULL COMMENT 'id',
  `owner_333_name` varchar(50) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '',
  `owner_333_222_id` int(11) DEFAULT NULL COMMENT 'id',
  `owner_333_222_name` varchar(50) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '',
  `owner_333_center_id` int(11) DEFAULT NULL COMMENT 'id',
  `owner_333_center_name` varchar(200) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '',
  `dddd_type` int(11) NOT NULL DEFAULT '0' COMMENT '',
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Add fields:

alter table dim.dim_2222_user add column `555_333_center_name` varchar(200) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `555_333_center_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `555_333_111_name` varchar(50) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `555_333_111_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `555_333_name` varchar(50) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `555_333_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `555_111_name` varchar(45) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `555_job_number` varchar(32) COLLATE utf8mb4_general_ci DEFAULT '' COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `555_name` varchar(45) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `555_111_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `555_consult_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `000_333_center_name` varchar(200) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `000_333_center_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `000_333_111_name` varchar(50) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `000_333_111_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `000_333_name` varchar(50) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '' after consult_id;
alter table dim.dim_2222_user add column `000_333_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;
alter table dim.dim_2222_user add column `000_111_id` int(11) DEFAULT NULL COMMENT 'id' after consult_id;

Then CDC stopped syncing, and three CDC components crashed one after another, stopping the sync;
So I stopped the data writing to the tables created and added fields yesterday,
Located the issue to be the dim.* sync exception, so I deleted and recreated the CDC sync task,

But as soon as it starts, it reports an error;

It lasted until the afternoon, then it started working fine :dotted_line_face:

During this period, I stopped, deleted, and recreated the CDC sync task multiple times;

Feeling confused

[Attachment: Screenshot/Log/Monitoring]

Found the exception in the log:

[2024/06/27 23:43:27.459 +08:00] [ERROR] [client.go:752] ["[pd] fetch pending tso requests error"] [dc-location=global] [error="[PD:client:ErrClientGetTSO]context canceled: context canceled"]
[2024/06/27 23:43:27.459 +08:00] [INFO] [client.go:666] ["[pd] exit tso dispatcher"] [dc-location=global]
[2024/06/27 23:43:27.459 +08:00] [INFO] [capture.go:323] ["the processor routine has exited"] [error="[CDC:ErrPDEtcdAPIError]etcd api call error: context canceled"] [errorVerbose="[CDC:ErrPDEtcdAPIError]etcd api call error: context canceled\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/errors.WrapError\n\tgithub.com/pingcap/tiflow/pkg/errors/helper.go:30\ngithub.com/pingcap/tiflow/cdc/capture.(*Capture).runEtcdWorker\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:476\ngithub.com/pingcap/tiflow/cdc/capture.(*Capture).run.func3\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:322\nruntime.goexit\n\truntime/asm_amd64.s:1371"]
[2024/06/27 23:43:27.459 +08:00] [INFO] [acquirer.go:72] ["TimeAcquirer exit"]
[2024/06/27 23:43:27.459 +08:00] [INFO] [client.go:234] ["WatchWithChan exited"] [role=processor]
[2024/06/27 23:43:27.459 +08:00] [INFO] [capture.go:299] ["the owner routine has exited"] []
[2024/06/27 23:43:27.459 +08:00] [ERROR] [http_status.go:74] ["http server error"] [error="[CDC:ErrServeHTTP]serve http error: mux: server closed"] [errorVerbose="[CDC:ErrServeHTTP]serve http error: mux: server closed\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/errors.WrapError\n\tgithub.com/pingcap/tiflow/pkg/errors/helper.go:30\ngithub.com/pingcap/tiflow/cdc.(*Server).startStatusHTTP.func1\n\tgithub.com/pingcap/tiflow/cdc/http_status.go:74\nruntime.goexit\n\truntime/asm_amd64.s:1371"]
[2024/06/27 23:43:27.460 +08:00] [INFO] [capture.go:257] ["capture recovered"] [capture-id=99bbe3c0-fe7d-47f4-8619-684e223a4d7a]
[2024/06/27 23:43:27.460 +08:00] [INFO] [capture.go:234] ["the capture routine has exited"]
[2024/06/27 23:43:27.460 +08:00] [ERROR] [client.go:752] ["[pd] fetch pending tso requests error"] [dc-location=global] [error="[PD:client:ErrClientGetTSO]context canceled: context canceled"]
| username: 表渣渣渣 | Original post link

To clarify: our CDC synchronizes all tables under the dim database.