TiFlash Node Keeps Restarting

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiFlash节点不停重启

| username: wakaka

【TiDB Usage Environment】Production Environment
【TiDB Version】5.0.6
【Encountered Problem】TiFlash node keeps restarting
【Reproduction Path】Operations performed that led to the issue
【Problem Phenomenon and Impact】
One TiFlash node appears offline on the Dashboard. Upon logging into the node, it restarts periodically and no data synchronization occurs.
【Attachments】Related logs and monitoring (https://metricstool.pingcap.com/)

tiflash_error.log

For questions related to performance optimization and fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

| username: wakaka | Original post link

2022.07.25 16:01:53.394712 [ 1 ] Application: The configuration “path” is deprecated. Check [storage] section for new style.
2022.07.25 16:25:50.967089 [ 28 ] pingcap.tikv: region {295876070,1907,3138} find error: region 295876070 is missing
2022.07.25 16:25:53.829955 [ 38 ] void DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, DB::PreHandledSnapshot*): Code: 49, e.displayText() = DB::Exception: DB::KVStore::checkAndApplySnapshot(const DB::RegionPtrWithBlock&, DB::TMTContext&)::<lambda(DB::RegionMap, const DB::KVStoreTaskLock&)>: range of region 270272686 is overlapped with region 172787948, should not happen, e.what() = DB::Exception, Stack trace:

  1. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x36c09e5]
  2. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int)+0x25) [0x36b7575]
  3. bin/tiflash/tiflash() [0x7710255]
  4. bin/tiflash/tiflash(DB::KVStore::handleRegionsByRangeOverlap(std::pair<DB::TiKVRangeKey, DB::TiKVRangeKey> const&, std::function<void (std::unordered_map<unsigned long, std::shared_ptrDB::Region, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::shared_ptrDB::Region > > >, DB::KVStoreTaskLock const&)>&&) const+0x52) [0x7314d92]
  5. bin/tiflash/tiflash(DB::KVStore::checkAndApplySnapshot(DB::RegionPtrWithBlock const&, DB::TMTContext&)+0x291) [0x7711441]
  6. bin/tiflash/tiflash(DB::KVStore::handlePreApplySnapshot(DB::RegionPtrWithBlock const&, DB::TMTContext&)+0x162) [0x77125e2]
  7. bin/tiflash/tiflash(DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, DB::PreHandledSnapshot*)+0x3f) [0x732497f]
  8. bin/tiflash/tiflash(DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, void*, unsigned int)+0x1d) [0x7324a3d]
  9. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xc93e59) [0x7fe99301ce59]
  10. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xc8eaf2) [0x7fe993017af2]
  11. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x9fb326) [0x7fe992d84326]
  12. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1db7640) [0x7fe994140640]
  13. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x563206) [0x7fe9928ec206]
  14. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x59d9ff) [0x7fe9929269ff]
  15. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1577c9f) [0x7fe993900c9f]
  16. /lib64/libpthread.so.0(+0x7e64) [0x7fe991b4ee64]
  17. /lib64/libc.so.6(clone+0x6c) [0x7fe99157588c]

2022.07.25 16:26:11.648538 [ 1 ] Application: The configuration “path” is deprecated. Check [storage] section for new style.
2022.07.25 16:51:01.082572 [ 32 ] pingcap.tikv: region {295507460,1913,2785} find error: EpochNotMatch current epoch of region 295507460 is conf_ver: 1913 version: 2786, but you sent conf_ver: 1913 version: 2785
2022.07.25 16:51:02.979909 [ 32 ] pingcap.tikv: region {295724190,1907,3140} find error: EpochNotMatch current epoch of region 295724190 is conf_ver: 1907 version: 3141, but you sent conf_ver: 1907 version: 3140
2022.07.25 16:51:05.811363 [ 38 ] void DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, DB::PreHandledSnapshot*): Code: 49, e.displayText() = DB::Exception: DB::KVStore::checkAndApplySnapshot(const DB::RegionPtrWithBlock&, DB::TMTContext&)::<lambda(DB::RegionMap, const DB::KVStoreTaskLock&)>: range of region 270272686 is overlapped with region 172787948, should not happen, e.what() = DB::Exception, Stack trace:

  1. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x36c09e5]
  2. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int)+0x25) [0x36b7575]
  3. bin/tiflash/tiflash() [0x7710255]
  4. bin/tiflash/tiflash(DB::KVStore::handleRegionsByRangeOverlap(std::pair<DB::TiKVRangeKey, DB::TiKVRangeKey> const&, std::function<void (std::unordered_map<unsigned long, std::shared_ptrDB::Region, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::shared_ptrDB::Region > > >, DB::KVStoreTaskLock const&)>&&) const+0x52) [0x7314d92]
  5. bin/tiflash/tiflash(DB::KVStore::checkAndApplySnapshot(DB::RegionPtrWithBlock const&, DB::TMTContext&)+0x291) [0x7711441]
  6. bin/tiflash/tiflash(DB::KVStore::handlePreApplySnapshot(DB::RegionPtrWithBlock const&, DB::TMTContext&)+0x162) [0x77125e2]
  7. bin/tiflash/tiflash(DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, DB::PreHandledSnapshot*)+0x3f) [0x732497f]
  8. bin/tiflash/tiflash(DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, void*, unsigned int)+0x1d) [0x7324a3d]
  9. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xc93e59) [0x7f66ef747e59]
  10. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xc8eaf2) [0x7f66ef742af2]
  11. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x9fb326) [0x7f66ef4af326]
  12. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1db7640) [0x7f66f086b640]
  13. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x563206) [0x7f66ef017206]
  14. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x59d9ff) [0x7f66ef0519ff]
  15. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1577c9f) [0x7f66f002bc9f]
  16. /lib64/libpthread.so.0(+0x7e64) [0x7f66ee279e64]
  17. /lib64/libc.so.6(clone+0x6c) [0x7f66edca088c]

2022.07.25 16:51:23.645341 [ 1 ] Application: The configuration “path” is deprecated. Check [storage] section for new style.
2022.07.25 17:16:28.651135 [ 38 ] void DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, DB::PreHandledSnapshot*): Code: 49, e.displayText() = DB::Exception: DB::KVStore::checkAndApplySnapshot(const DB::RegionPtrWithBlock&, DB::TMTContext&)::<lambda(DB::RegionMap, const DB::KVStoreTaskLock&)>: range of region 270272686 is overlapped with region 172787948, should not happen, e.what() = DB::Exception, Stack trace:

  1. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x36c09e5]
  2. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int)+0x25) [0x36b7575]
  3. bin/tiflash/tiflash() [0x7710255]
  4. bin/tiflash/tiflash(DB::KVStore::handleRegionsByRangeOverlap(std::pair<DB::TiKVRangeKey, DB::TiKVRangeKey> const&, std::function<void (std::unordered_map<unsigned long, std::shared_ptrDB::Region, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::shared_ptrDB::Region > > >, DB::KVStoreTaskLock const&)>&&) const+0x52) [0x7314d92]
  5. bin/tiflash/tiflash(DB::KVStore::checkAndApplySnapshot(DB::RegionPtrWithBlock const&, DB::TMTContext&)+0x291) [0x7711441]
  6. bin/tiflash/tiflash(DB::KVStore::handlePreApplySnapshot(DB::RegionPtrWithBlock const&, DB::TMTContext&)+0x162) [0x77125e2]
  7. bin/tiflash/tiflash(DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, DB::PreHandledSnapshot*)+0x3f) [0x732497f]
  8. bin/tiflash/tiflash(DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, void*, unsigned int)+0x1d) [0x7324a3d]
  9. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xc93e59) [0x7fbf58a65e59]
  10. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0xc8eaf2) [0x7fbf58a60af2]
  11. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x9fb326) [0x7fbf587cd326]
  12. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1db7640) [0x7fbf59b89640]
  13. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x563206) [0x7fbf58335206]
  14. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x59d9ff) [0x7fbf5836f9ff]
  15. /chj/app/tidb/deploy/tiflash-9000/bin/tiflash/libtiflash_proxy.so(+0x1577c9f) [0x7fbf59349c9f]
  16. /lib64/libpthread.so.0(+0x7e64) [0x7fbf57597e64]
  17. /lib64/libc.so.6(clone+0x6c) [0x7fbf56fbe88c]

2022.07.25 17:16:46.643977 [ 1 ] Application: The configuration “path” is deprecated. Check [storage] section for new style.

| username: 长安是只喵 | Original post link

It feels a bit similar to this

| username: wakaka | Original post link

Is there a good way to avoid restarting for now?

| username: wakaka | Original post link

After attempting to scale in and out again, there was an issue where the node could not be removed.

| username: songxuecheng | Original post link

release test: `schrodinger/bank true` failed with error `region range overlapped` · Issue #3435 · pingcap/tiflash · GitHub Upgrade

| username: wakaka | Original post link

At the moment, we cannot upgrade. This is the current situation: tiup prune无效果 - TiDB 的问答社区

| username: alfred | Original post link

Are there any abnormal logs at the OS level?

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.