TiFlash instance lost connection when creating TiFlash data replica

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 创建 TiFlash 数据副本时 TiFlash 实例失联

| username: TiDBer_CQ

[TiDB Usage Environment] Production Environment / Testing / Poc
[Attachment: Screenshot / Log / Monitoring]
[ERROR] [Exception.cpp:90] [“Code: 49, e.displayText() = DB::Exception: Check compare(range.getStart(), ext_file.range.getStart()) <= 0 && compare(range.getEnd(), ext_file.range.getEnd()) >= 0 failed: Detected illegal region boundary: range=[940850,950127) file_range=[940850,950128). TiFlash will exit to prevent data inconsistency. If you accept data inconsistency and want to continue the service, set profiles.default.dt_enable_ingest_check=false., e.what() = DB::Exception, Stack trace:\n\n\n 0x18e2808\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int) [tiflash+26093576]\n \tdbms/src/Common/Exception.h:46\n 0x66c5868\tDB::DM::DeltaMergeStore::ingestFiles(std::__1::shared_ptrDB::DM::DMContext const&, DB::DM::RowKeyRange const&, std::__1::vector<DB::DM::ExternalDTFileInfo, std::__1::allocatorDB::DM::ExternalDTFileInfo > const&, bool) [tiflash+107763816]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore_Ingest.cpp:562\n 0x7048ec0\tDB::DM::DeltaMergeStore::ingestFiles(DB::Context const&, DB::Settings const&, DB::DM::RowKeyRange const&, std::__1::vector<DB::DM::ExternalDTFileInfo, std::1::allocatorDB::DM::ExternalDTFileInfo > const&, bool) [tiflash+117739200]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore.h:305\n 0x70e493c\tvoid DB::KVStore::checkAndApplyPreHandledSnapshotDB::RegionPtrWithSnapshotFiles(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&) [tiflash+118376764]\n \tdbms/src/Storages/Transaction/ApplySnapshot.cpp:139\n 0x70e36e0\tvoid DB::KVStore::applyPreHandledSnapshotDB::RegionPtrWithSnapshotFiles(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&) [tiflash+118372064]\n \tdbms/src/Storages/Transaction/ApplySnapshot.cpp:430\n 0x714f258\tApplyPreHandledSnapshot [tiflash+118813272]\n \tdbms/src/Storages/Transaction/ProxyFFI.cpp:664\n 0xfffdecf7e108\t$LT$engine_store_ffi…observer…TiFlashObserver$LT$T$C$ER$GT$$u20$as$u20$raftstore…coprocessor…ApplySnapshotObserver$GT$::post_apply_snapshot::h37b78235775c5464 [libtiflash_proxy.so+23388424]\n 0xfffdedcec374\traftstore::store::worker::region::Runner$LT$EK$C$R$C$T$GT$::handle_pending_applies::hea284cfca343c5ad [libtiflash_proxy.so+37471092]\n 0xfffded439a6c\tyatp::task::future::RawTask$LT$F$GT$::poll::h009a768630ae99e9 [libtiflash_proxy.so+28351084]\n 0xfffdee9e61ac\t$LT$yatp…task…future…Runner$u20$as$u20$yatp…pool…runner…Runner$GT$::handle::h21119c0ed142ec41 [libtiflash_proxy.so+51077548]\n 0xfffded05bc90\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h81ce124c88a5854e [libtiflash_proxy.so+24296592]\n 0xfffded0a5950\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hfd9faa5e3b4e1b6c [libtiflash_proxy.so+24598864]\n 0xfffdee27efa8\tstd::sys::unix::thread::thread::new::thread_start::heb4b82eb38eaaa15 [libtiflash_proxy.so+43315112]\n 0xfffdeb7d88cc\t [libpthread.so.0+35020]\n 0xfffdeb5ca1ec\t [libc.so.6+893420]”] [source=“void DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap *, PreHandledSnapshot *) [PreHandledSnapshot = DB::PreHandledSnapshotWithFiles]”] [thread_id=286]

| username: 有猫万事足 | Original post link

I saw this issue a few days ago.

| username: 有猫万事足 | Original post link

The difference is that in version 7.1.1, the error message does not include the table ID, making it difficult to identify which table is causing the issue.

You might need to test each table individually. However, the actual solutions are either to check for overlap SSTs using tikv bad-ssts or to set the TiFlash parameters as suggested to skip this check.

| username: tidb菜鸟一只 | Original post link

Gave you a suggestion, see if you can accept it…

| username: cassblanca | Original post link

Why is there data inconsistency?

| username: CHENGX | Original post link

Where is this parameter adjusted? It seems that I couldn’t find it in the documentation. If there is any relevant explanation, could you please send it over?

| username: tidb菜鸟一只 | Original post link

It’s probably a bug in version 7.1.1. I think I’ve seen similar issues on the forum before.

| username: tidb菜鸟一只 | Original post link

The documentation indeed does not mention this. Literally, it suggests that consistency checks are not mandatory. This should be caused by a bug. If you cannot accept this, you can identify the corresponding table and determine the specific issue.

| username: CHENGX | Original post link

I’m not quite sure what this means. Are you saying that this configuration prompt can’t actually be set, and you can only check the data consistency of the table?

| username: 有猫万事足 | Original post link

If you are referring to the configuration item profiles.default.dt_enable_ingest_check=false mentioned in the prompt, it is indeed not documented.

However, the configuration item in this prompt should be effective. After all,

You can see that before entering this check, TiFlash did check whether this configuration item is enabled.

| username: TiDBer_CQ | Original post link

I tried various speed adjustments multiple times, but the same issue persisted. It might be due to the large amount of data in the table for which the replica was being created. Finally, I resolved it by first creating a TiFlash replica for an empty table with the same schema, and then gradually synchronizing the data from the original table to the new table.

| username: redgame | Original post link

The error message mentioned an illegal region boundary, which may lead to data inconsistency. To prevent data inconsistency, TiFlash will exit.

| username: cy6301567 | Original post link

It is best to be more stable and reliable than the latest first version.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.