TiDB 5.2.1 TiCDC synchronization to Kafka does not advance TSO and checkpoint after creation without errors

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 5.2.1 ticdc 同步到kafka创建后 无报错 tso 和checkpoint 不推进

| username: TiDBer_ZsnVPQB4

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.2.1
[Reproduction Path]
– Create a full database CDC synchronization
tiup ctl:v5.2.1 cdc changefeed create --pd=http://10.30.30.4:2379 --sink-uri=“kafka://node3.gcl-kafka.com:9092/ticdc_jnpf_tenant_859852?kafka-version=2.13.3&partition-num=3&max-message-bytes=128108864&replication-factor=1&protocol=canal-json” --changefeed-id=“ticdc-jnpf-tenant-859852” --config=/root/ops/cdc-config-jnpf_tenant_859852.conf

– View all task lists
tiup ctl:v5.2.1 cdc changefeed list --pd=http://10.30.30.4:2379
– Delete synchronization task
tiup ctl:v5.2.1 cdc changefeed remove --pd=http://10.30.30.4:2379 --changefeed-id ticdc-jnpf-tenant-859852
– Delete topic
/usr/local/kafka/bin/kafka-topics.sh --delete --bootstrap-server 10.30.254.2:9092 --topic ticdc_jnpf_tenant_859852

Deleting the synchronization task and deleting the topic, then recreating it still results in the same error.
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
– No errors in the CDC log, it just outputs “add a table” and then stops.


– Multiple queries show no errors, TSO and checkpoint remain unchanged.

| username: TiDBer_ZsnVPQB4 | Original post link

– Some prompt information
requestID=220869] [span=“[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”] [checkpoint=437455110267731969] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1217751 > “]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_cache.go:1070] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=1217751]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.290 +08:00] [INFO] [client.go:876] [“cannot get rpcCtx, retry span”] [regionID=1217751] [span=”[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.291 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.291 +08:00] [INFO] [client.go:774] [“start new request”] [request=“{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220870,"Request":null}”] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.291 +08:00] [WARN] [client.go:781] [“send request to stream failed”] [addr=10.30.30.11:20160] [storeID=1] [regionID=1217751] [requestID=220870] [error=EOF]
[2022/11/18 16:06:53.326 +08:00] [INFO] [client_v2.go:163] [“stream to store closed”] [addr=10.30.30.11:20160] [storeID=1]
[2022/11/18 16:06:53.336 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.336 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.336 +08:00] [INFO] [client.go:721] [“creating new stream to store to send request”] [regionID=1217751] [requestID=220871] [storeID=1] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.336 +08:00] [INFO] [client.go:774] [“start new request”] [request=“{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220871,"Request":null}”] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_worker.go:249] [“single region event feed disconnected”] [regionID=1217751] [requestID=220871] [span=“[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”] [checkpoint=437455110267731969] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1217751 leader:<id:1217754 store_id:5 > > “]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_cache.go:1083] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=1217751] [currIdx=0] [leaderStoreID=5]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.337 +08:00] [INFO] [client.go:721] [“creating new stream to store to send request”] [regionID=1217751] [requestID=220872] [storeID=5] [addr=10.30.30.10:20160]
[2022/11/18 16:06:53.338 +08:00] [INFO] [client.go:774] [“start new request”] [request=”{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220872,"Request":null}”] [addr=10.30.30.10:20160]
[2022/11/18 16:06:53.373 +08:00] [INFO] [client_v2.go:163] [“stream to store closed”] [addr=10.30.30.11:20160] [storeID=1]

| username: hey-hoho | Original post link

Please provide the content of the changefeed configuration file.

| username: TiDBer_ZsnVPQB4 | Original post link

# Specify whether the database and table names involved in the configuration file are case-sensitive
# This configuration will affect both filter and sink related configurations, default is true
case-sensitive = true

# Whether to output old value, supported from v4.0.5, default is true from v5.0
enable-old-value = true

[filter]
# Ignore transactions with specified start_ts
#ignore-txn-start-ts = [1, 2]

# Filter rules
# Filter rule syntax: https://docs.pingcap.com/zh/tidb/stable/table-filter#table-database-filter-syntax
rules = ['jnpf_tenant_859852.*']

[mounter]
# Number of mounter threads, used to decode data output from TiKV
worker-num = 4

[sink]
# For MQ type Sinks, event dispatchers can be configured through dispatchers
# Supports four dispatchers: default, ts, rowid, table. The dispatch rules are as follows:
# - default: If there are multiple unique indexes (including primary key), dispatch according to table mode; if there is only one unique index (or primary key), dispatch according to rowid mode; if the old value feature is enabled, dispatch according to table
# - ts: Dispatch events by calculating Hash based on the commitTs of row changes
# - rowid: Dispatch events by calculating Hash based on the primary key or unique index column name and column value of the table
# - table: Dispatch events by calculating Hash based on the schema name and table name of the table
# The matching syntax of matcher is the same as the filter rule syntax
dispatchers = []
#    {matcher = ['test1.*', 'test2.*'], dispatcher = "ts"},
#    {matcher = ['test3.*', 'test4.*'], dispatcher = "rowid"},
#]
# For MQ type Sinks, the protocol format of the message can be specified
# Currently supports four protocols: default, canal, avro, and maxwell. default is TiCDC Open Protocol
protocol = "canal"

[cyclic-replication]
# Whether to enable cyclic replication
enable = false
# Current TiCDC replica ID
#replica-id = 1
# IDs to be filtered out from replication
#filter-replica-ids = [2,3]
# Whether to synchronize DDL
#sync-ddl = true
| username: hey-hoho | Original post link

Try inserting a test data entry into that database and observe the TiCDC logs to see if there are any error messages.

| username: asddongmen | Original post link

May I ask how many tables are there in the database you need to synchronize? For version 5.2.1, it is recommended that the total number of tables synchronized by TiCDC should not exceed 1.5k.

| username: Meditator | Original post link

Take a look at whether the region with regionID=1217751 has any anomalies?

| username: TiDBer_ZsnVPQB4 | Original post link

813 tables, not yet 1.5K.

| username: TiDBer_ZsnVPQB4 | Original post link

Deleting and recreating will change to another regionID. The time from getting stuck at “add a table” to the error should be close to the tidb_gc_life_time, which is currently set to 30 minutes.

| username: TiDBer_ZsnVPQB4 | Original post link

The database has been receiving data continuously, and no other abnormal information has been observed.

| username: TiDBer_ZsnVPQB4 | Original post link

Currently, I have restarted the two CDC nodes and changed the tidb_gc_life_time from 30m0s to 10m0s, and synchronization is now working. After changing it back to 30m0s, synchronization and checkpoint advancement are still functioning. This big move solved the problem.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.