TiFlash's Compute-Storage Separation Architecture Lacks Compatibility with OSS and Does Not Support LifeCycle Mode

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiFlash存算分离架构对OSS的兼容性不足,不支持LifeCycle方式

| username: dba-kit

Currently, TiFlash’s support for OSS is not perfect. With the default configuration parameters, errors occur that prevent historical data from being deleted after merge or split, leading to continuous retries.

Using AK permissions, which have full control:

The error content of TiFlash write_node is:

[2024/02/02 18:36:47.373 +08:00] [WARN] [S3Common.cpp:145] ["tag=AWSErrorMarshaller message=Encountered Unknown AWSError 'NotImplemented': A header you provided implies functionality that is not implemented."] [source=AWSClient] [thread_id=974]
[2024/02/02 18:36:47.373 +08:00] [ERROR] [S3Common.cpp:145] ["tag=AWSXmlClient message=HTTP response code: 400\nResolved remote host IP address: 100.118.78.2:443\nRequest ID: 65BCC5BFE7447A3130E9E56E\nException name: NotImplemented\nError message: Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented.\n7 response headers:\nconnection : close\ncontent-length : 334\ncontent-type : application/xml\ndate : Fri, 02 Feb 2024 10:36:47 GMT\nserver : AliyunOSS\nx-amz-request-id : 65BCC5BFE7447A3130E9E56E\nx-oss-server-time : 0"] [source=AWSClient] [thread_id=974]
[2024/02/02 18:36:47.374 +08:00] [ERROR] [S3Common.cpp:572] ["S3 PutEmptyObject failed: Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented., request_id=65BCC5BFE7447A3130E9E56E bucket=mysql-dts-migrate root=perf-asset-tiflash-data/ key=s1069816/data/t_80772/dmf_2.del"] [source="bucket=mysql-dts-migrate root=perf-asset-tiflash-data/"] [thread_id=974]

[2024/02/02 18:36:47.396 +08:00] [ERROR] [S3LockService.cpp:136] ["DB Exception: S3 PutEmptyObject failed, bucket=mysql-dts-migrate root=perf-asset-tiflash-data/ key=s1069816/data/t_80772/dmf_2.del s3error=UNKNOWN s3exception_name=NotImplemented s3msg=Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented. request_id=65BCC5BFDE10FD3531A4B28E\n\n       0x809e528\tDB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+134866216]\n                \tdbms/src/Common/Exception.h:53\n       0x808e9ce\tDB::Exception DB::S3::fromS3Error<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(Aws::S3::S3Error const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+134801870]\n                \tdbms/src/Storages/S3/S3Common.h:57\n       0x808fb28\tDB::S3::uploadEmptyFile(DB::S3::TiFlashS3Client const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+134806312]\n                \tdbms/src/Storages/S3/S3Common.cpp:584\n       0x89f3cd9\tDB::S3::S3LockService::tryMarkDelete(disaggregated::TryMarkDeleteRequest const*, disaggregated::TryMarkDeleteResponse*) [tiflash+144653529]\n                \tdbms/src/Flash/Disaggregated/S3LockService.cpp:117\n       0x86fa5ea\tDB::FlashService::tryMarkDelete(grpc::ServerContext*, disaggregated::TryMarkDeleteRequest const*, disaggregated::TryMarkDeleteResponse*) [tiflash+141534698]\n                \tdbms/src/Flash/FlashService.cpp:874\n       0x98cb9d7\tgrpc::internal::RpcMethodHandler<tikvpb::Tikv::Service, disaggregated::TryMarkDeleteRequest, disaggregated::TryMarkDeleteResponse, google::protobuf::MessageLite, google::protobuf::MessageLite>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&) [tiflash+160217559]\n                \tcontrib/grpc/include/grpcpp/impl/codegen/method_handler.h:113\n       0x9242130\tgrpc::Server::SyncRequest::ContinueRunAfterInterception() [tiflash+153362736]\n                \tcontrib/grpc/src/cpp/server/server_cc.cc:433\n       0x9241f61\tgrpc::Server::SyncRequest::Run(std::__1::shared_ptr<grpc::Server::GlobalCallbacks> const&, bool) [tiflash+153362273]\n                \tcontrib/grpc/src/cpp/server/server_cc.cc:421\n       0x9254155\tgrpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) [tiflash+153436501]\n                \tcontrib/grpc/src/cpp/thread_manager/thread_manager.cc:36\n       0x95ee66a\tgrpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) [tiflash+157214314]\n                \tcontrib/grpc/src/core/lib/gprpp/thd_posix.cc:110\n  0x7f89afcecac3\t<unknown symbol> [libc.so.6+608963]\n  0x7f89afd7e850\t<unknown symbol> [libc.so.6+1206352]"] [thread_id=974]
[2024/02/02 18:36:47.396 +08:00] [ERROR] [S3LockClient.cpp:121] ["meets error, code=13 msg=S3 PutEmptyObject failed, bucket=mysql-dts-migrate root=perf-asset-tiflash-data/ key=s1069816/data/t_80772/dmf_2.del s3error=UNKNOWN s3exception_name=NotImplemented s3msg=Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented. request_id=65BCC5BFDE10FD3531A4B28E"] [source="<key=s1069816/data/t_80772/dmf_2,type=MarkDelete>"] [thread_id=966]
| username: dba-kit | Original post link

This is the monitoring during the error, you can see it keeps retrying.

| username: dba远航 | Original post link

There should be an issue with TiFlash syntax compatibility.

| username: 哈喽沃德 | Original post link

OSS isn’t fast, right?

| username: TiDBer_5Vo9nD1u | Original post link

Alibaba Cloud: public or private?

| username: xfworld | Original post link

After all, it only supports the S3-compatible protocol…

| username: WinterLiu | Original post link

Will this type of object storage be faster than local SSDs?

| username: dba-kit | Original post link

Subsequently, following the guidance from @flow-PingCAP, you can add profiles.default.remote_gc_method: 2 to the configuration of write-node to delete objects using the original scan method.
The parameter description is:

the method of running GC task on the remote store. 1 - lifecycle, 2 - scan.
| username: dba-kit | Original post link

Indeed, but it can be bypassed through other means.

| username: xfworld | Original post link

Wow, hurry up and share your thoughts~ :+1::+1::+1:

| username: dba-kit | Original post link

You can check the test data in this post. The first query will directly read data from S3, which will be significantly slower. However, once cached, subsequent query efficiency is comparable to local SSD performance.