How to Handle region_not_found

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: region_not_found 怎么处理

| username: Hacker_eZSjet7O

【TiDB Usage Environment】Production Environment
【TiDB Version】v7.5.0
【Reproduction Path】Continuous errors
【Encountered Problem: Phenomenon and Impact】No impact found yet, but there are continuous errors in the logs.
【Resource Configuration】
TICDC keeps alerting: ticdc_changefeed_meet_error, checking the cdc logs:

2024/05/23 19:02:21.997 +08:00] [INFO] [shared_stream.go:481] [“event feed receives a region error”] [namespace=default] [changefeed=centre-mds] [streamID=218] [subscriptionID=571] [regionID=64577] [stateIsNil=false] [error="region_not_found:<region_id:64577 > "]
2024/05/23 19:02:21.997 +08:00] [INFO] [shared_region_worker.go:108] [“region worker get a region error”] [namespace=default] [changefeed=centre-mds] [streamID=218] [subscriptionID=64577] [regionID=64577] [reschedule=true] [error="region_not_found:<region_id:64577 > "]
2024/05/23 19:11:15.430 +08:00] [INFO] [middleware.go:49] [/api/v2/changefeeds] [status=200] [method=GET] [path=/api/v2/changefeeds] [query=“namespace=default&state=all”] [ip=192.168.217.184] [user-agent=Go-http-client/1.1] [client-version=v7.5.0] [duration=82.221133ms]
2024/05/23 19:32:48.760 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=288.811611ms]
2024/05/23 20:02:39.103 +08:00] [INFO] [shared_stream.go:481] [“event feed receives a region error”] [namespace=default] [changefeed=centre-mds] [streamID=218] [subscriptionID=571] [regionID=64549] [stateIsNil=false] [error="epoch_not_match:<current_regions:<id:64581 start_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377sr\201\000\000\000\000\000\372" end_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377s\305X\000\000\000\000\000\372" region_epoch:<conf_ver:5 version:1547 > peers:<id:64582 store_id:1 > peers:<id:64583 store_id:2 > peers:<id:64584 store_id:3 > > current_regions:<id:64549 start_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377s\305X\000\000\000\000\000\372" end_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377s\362\336\000\000\000\000\000\372" region_epoch:<conf_ver:5 version:1547 > peers:<id:64550 store_id:1 > peers:<id:64551 store_id:2 > peers:<id:64552 store_id:3 > > > "]
2024/05/23 20:02:39.103 +08:00] [INFO] [shared_region_worker.go:108] [“region worker get a region error”] [namespace=default] [changefeed=centre-mds] [streamID=218] [subscriptionID=64549] [regionID=64549] [reschedule=true] [error="epoch_not_match:<current_regions:<id:64581 start_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377sr\201\000\000\000\000\000\372" end_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377s\305X\000\000\000\000\000\372" region_epoch:<conf_ver:5 version:1547 > peers:<id:64582 store_id:1 > peers:<id:64583 store_id:2 > peers:<id:64584 store_id:3 > > current_regions:<id:64549 start_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377s\305X\000\000\000\000\000\372" end_key:"t\200\000\000\000\000\000\002\377\033_r\200\000\000\000\001\377s\362\336\000\000\000\000\000\372" region_epoch:<conf_ver:5 version:1547 > peers:<id:64550 store_id:1 > peers:<id:64551 store_id:2 > peers:<id:64552 store_id:3 > > > "]
2024/05/23 20:02:39.104 +08:00] [INFO] [shared_stream.go:481] [“event feed receives a region error”] [namespace=default] [changefeed=centre-mds] [streamID=218] [subscriptionID=571] [regionID=64581] [stateIsNil=false] [error="region_not_found:<region_id:64581 > "]
2024/05/23 20:02:39.105 +08:00] [INFO] [shared_region_worker.go:108] [“region worker get a region error”] [namespace=default] [changefeed=centre-mds] [streamID=218] [subscriptionID=64581] [regionID=64581] [reschedule=true] [error="region_not_found:<region_id:64581 > "]

| username: Hacker_eZSjet7O | Original post link

Please advise on how to handle this.

| username: 小龙虾爱大龙虾 | Original post link

If there is no delay, there is no need to handle it. Normally, there are some logs like this, all at the INFO level, so no need to handle them.

| username: TIDB-Learner | Original post link

get timestamp too slow…
Check TiDB and PD on Grafana.

| username: Hacker_eZSjet7O | Original post link

There was no ticdc_changefeed_meet_error alert before, but now it keeps reporting.

| username: Hacker_eZSjet7O | Original post link

It should be a region worker get a region error, but I don’t know how to handle it.

| username: Kongdom | Original post link

Please refer to the FAQ

| username: yytest | Original post link

  • region_not_found error: This usually means that the requested Region does not exist in the cluster. This could be due to the Region being merged or split, or an incorrect request. You can check the PD logs and monitoring to confirm if there are any Region merge or split events. Also, ensure that the TiCDC configuration is correct and that it can properly connect to the TiKV nodes.
  • epoch_not_match error: This indicates that the requested Region Epoch does not match the actual Epoch in the cluster. This typically occurs when there are changes in the Region, such as Leader changes or Region splits. You can check the PD logs to see if there are any related Region change events.
  • Network issues: If these errors are caused by network problems, you need to check the network configuration to ensure that TiCDC can communicate normally with PD and TiKV nodes.
| username: Hacker_eZSjet7O | Original post link

It seems to be a bug, but I don’t understand what the solution is.
huge logs for “event feed receives a region error” and “region worker get a region error” during initial scan · Issue #10177 · pingcap/tiflow · GitHub

| username: 像风一样的男子 | Original post link

All info can be ignored.

| username: Hacker_eZSjet7O | Original post link

I would like to ignore it, but the alert keeps going off.
It seems to be a bug. Can you help me find a solution? I don’t quite understand it.

| username: zhanggame1 | Original post link

Can’t handle it, I have a lot here too.

| username: 友利奈绪 | Original post link

Just ignore the info.

| username: 鱼跃龙门 | Original post link

Info information, not enough resources, overwhelmed? Need to add resources?

| username: 濱崎悟空 | Original post link

Check the monitoring to see the status of the components.

| username: Kongdom | Original post link

Can this FAQ solve it?

| username: zhaokede | Original post link

Ignore the info level, it might be appearing in dynamic regions.