Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb 5.2.1 ticdc 同步到kafka创建后 无报错 tso 和checkpoint 不推进
[TiDB Usage Environment] Production Environment
[TiDB Version] v5.2.1
[Reproduction Path]
– Create a full database CDC synchronization
tiup ctl:v5.2.1 cdc changefeed create --pd=http://10.30.30.4:2379 --sink-uri=“kafka://node3.gcl-kafka.com:9092/ticdc_jnpf_tenant_859852?kafka-version=2.13.3&partition-num=3&max-message-bytes=128108864&replication-factor=1&protocol=canal-json” --changefeed-id=“ticdc-jnpf-tenant-859852” --config=/root/ops/cdc-config-jnpf_tenant_859852.conf
– View all task lists
tiup ctl:v5.2.1 cdc changefeed list --pd=http://10.30.30.4:2379
– Delete synchronization task
tiup ctl:v5.2.1 cdc changefeed remove --pd=http://10.30.30.4:2379 --changefeed-id ticdc-jnpf-tenant-859852
– Delete topic
/usr/local/kafka/bin/kafka-topics.sh --delete --bootstrap-server 10.30.254.2:9092 --topic ticdc_jnpf_tenant_859852
Deleting the synchronization task and deleting the topic, then recreating it still results in the same error.
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
– No errors in the CDC log, it just outputs “add a table” and then stops.
– Multiple queries show no errors, TSO and checkpoint remain unchanged.
– Some prompt information
requestID=220869] [span=“[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”] [checkpoint=437455110267731969] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1217751 > “]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_cache.go:1070] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=1217751]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.290 +08:00] [INFO] [client.go:876] [“cannot get rpcCtx, retry span”] [regionID=1217751] [span=”[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.291 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.291 +08:00] [INFO] [client.go:774] [“start new request”] [request=“{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220870,"Request":null}”] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.291 +08:00] [WARN] [client.go:781] [“send request to stream failed”] [addr=10.30.30.11:20160] [storeID=1] [regionID=1217751] [requestID=220870] [error=EOF]
[2022/11/18 16:06:53.326 +08:00] [INFO] [client_v2.go:163] [“stream to store closed”] [addr=10.30.30.11:20160] [storeID=1]
[2022/11/18 16:06:53.336 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.336 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.336 +08:00] [INFO] [client.go:721] [“creating new stream to store to send request”] [regionID=1217751] [requestID=220871] [storeID=1] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.336 +08:00] [INFO] [client.go:774] [“start new request”] [request=“{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220871,"Request":null}”] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_worker.go:249] [“single region event feed disconnected”] [regionID=1217751] [requestID=220871] [span=“[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”] [checkpoint=437455110267731969] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1217751 leader:<id:1217754 store_id:5 > > “]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_cache.go:1083] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=1217751] [currIdx=0] [leaderStoreID=5]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.337 +08:00] [INFO] [client.go:721] [“creating new stream to store to send request”] [regionID=1217751] [requestID=220872] [storeID=5] [addr=10.30.30.10:20160]
[2022/11/18 16:06:53.338 +08:00] [INFO] [client.go:774] [“start new request”] [request=”{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220872,"Request":null}”] [addr=10.30.30.10:20160]
[2022/11/18 16:06:53.373 +08:00] [INFO] [client_v2.go:163] [“stream to store closed”] [addr=10.30.30.11:20160] [storeID=1]
Please provide the content of the changefeed configuration file.
# Specify whether the database and table names involved in the configuration file are case-sensitive
# This configuration will affect both filter and sink related configurations, default is true
case-sensitive = true
# Whether to output old value, supported from v4.0.5, default is true from v5.0
enable-old-value = true
[filter]
# Ignore transactions with specified start_ts
#ignore-txn-start-ts = [1, 2]
# Filter rules
# Filter rule syntax: https://docs.pingcap.com/zh/tidb/stable/table-filter#table-database-filter-syntax
rules = ['jnpf_tenant_859852.*']
[mounter]
# Number of mounter threads, used to decode data output from TiKV
worker-num = 4
[sink]
# For MQ type Sinks, event dispatchers can be configured through dispatchers
# Supports four dispatchers: default, ts, rowid, table. The dispatch rules are as follows:
# - default: If there are multiple unique indexes (including primary key), dispatch according to table mode; if there is only one unique index (or primary key), dispatch according to rowid mode; if the old value feature is enabled, dispatch according to table
# - ts: Dispatch events by calculating Hash based on the commitTs of row changes
# - rowid: Dispatch events by calculating Hash based on the primary key or unique index column name and column value of the table
# - table: Dispatch events by calculating Hash based on the schema name and table name of the table
# The matching syntax of matcher is the same as the filter rule syntax
dispatchers = []
# {matcher = ['test1.*', 'test2.*'], dispatcher = "ts"},
# {matcher = ['test3.*', 'test4.*'], dispatcher = "rowid"},
#]
# For MQ type Sinks, the protocol format of the message can be specified
# Currently supports four protocols: default, canal, avro, and maxwell. default is TiCDC Open Protocol
protocol = "canal"
[cyclic-replication]
# Whether to enable cyclic replication
enable = false
# Current TiCDC replica ID
#replica-id = 1
# IDs to be filtered out from replication
#filter-replica-ids = [2,3]
# Whether to synchronize DDL
#sync-ddl = true
Try inserting a test data entry into that database and observe the TiCDC logs to see if there are any error messages.
May I ask how many tables are there in the database you need to synchronize? For version 5.2.1, it is recommended that the total number of tables synchronized by TiCDC should not exceed 1.5k.
Take a look at whether the region with regionID=1217751 has any anomalies?
813 tables, not yet 1.5K.
Deleting and recreating will change to another regionID. The time from getting stuck at “add a table” to the error should be close to the tidb_gc_life_time, which is currently set to 30 minutes.
The database has been receiving data continuously, and no other abnormal information has been observed.
Currently, I have restarted the two CDC nodes and changed the tidb_gc_life_time
from 30m0s to 10m0s, and synchronization is now working. After changing it back to 30m0s, synchronization and checkpoint advancement are still functioning. This big move solved the problem.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.