TiFlash Error: Different Aggregation Mode Detected

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash报错 Different aggregation mode detected

| username: polars

【TiDB Usage Environment】Production Environment
【TiDB Version】6.5
【Reproduction Path】
Database enabled with TiFlash replicas
Execute the following query SQL:

SELECT DISTINCT t.CODE FROM (SELECT DIM_VALUE_CODE CODE FROM MDM_DIM_VALUE_RELATION WHERE DIM_CODE = ? AND MDM_DIM_VALUE_RELATION.tenant_id = 1640317608640241665 UNION ALL SELECT RELATION_DIM_VALUE_CODE CODE FROM MDM_DIM_VALUE_RELATION WHERE RELATION_DIM_CODE = ? AND MDM_DIM_VALUE_RELATION.tenant_id = 1640317608640241665) t

【Encountered Problem: Phenomenon and Impact】
SQL execution error: Cause: java.sql.SQLException: other error for mpp stream: From MPPquery:440763973823102983:2,task: Code: 0, e.displayText() = DB::TiFlashException: Different aggregation mode detected

【Resource Configuration】
【Attachments: Screenshots/Logs/Monitoring】,
TiDB logs are as follows:

[ERROR] [MPPTask.cpp:429] ["task running meets error: Code: 0, e.displayText() = DB::TiFlashException: Different aggregation mode detected, e.what() = DB::TiFlashException, Stack trace:\n\n\n       0x1735c91\tDB::TiFlashException::TiFlashException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::TiFlashError const&) [tiflash+24337553]\n                \tdbms/src/Common/TiFlashException.h:250\n       0x6d996a9\tDB::AggregationInterpreterHelper::isFinalAgg(tipb::Aggregation const&) [tiflash+114923177]\n                \tdbms/src/Flash/Coprocessor/AggregationInterpreterHelper.cpp:61\n       0x6e36218\tDB::PhysicalAggregation::build(DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Logger> const&, tipb::Aggregation const&, DB::FineGrainedShuffle const&, std::__1::shared_ptr<DB::PhysicalPlanNode> const&) [tiflash+115565080]\n                \tdbms/src/Flash/Planner/plans/PhysicalAggregation.cpp:83\n       0x6e284fd\tDB::PhysicalPlan::build(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, tipb::Executor const*) [tiflash+115508477]\n                \tdbms/src/Flash/Planner/PhysicalPlan.cpp:123\n       0x6e2af32\tvoid DB::traverseExecutorTreePostOrder<DB::PhysicalPlan::build(tipb::DAGRequest const*)::$_2>(tipb::Executor const&, DB::PhysicalPlan::build(tipb::DAGRequest const*)::$_2&&)::'lambda'(tipb::Executor const&)::operator()(tipb::Executor const&) const [tiflash+115519282]\n                \tdbms/src/Flash/Statistics/traverseExecutors.h:113\n       0x6e27c0f\tDB::PhysicalPlan::build(tipb::DAGRequest const*) [tiflash+115506191]\n                \tdbms/src/Flash/Planner/PhysicalPlan.cpp:78\n       0x6e25303\tDB::Planner::execute() [tiflash+115495683]\n                \tdbms/src/Flash/Planner/Planner.cpp:36\n       0x6d107f9\tDB::(anonymous namespace)::executeDAG(DB::IQuerySource&, DB::Context&, bool) [tiflash+114362361]\n                \tdbms/src/Flash/executeQuery.cpp:88\n       0x6d102f0\tDB::executeQuery(DB::Context&, bool) [tiflash+114361072]\n                \tdbms/src/Flash/executeQuery.cpp:108\n       0x6df343b\tDB::MPPTask::runImpl() [tiflash+115291195]\n                \tdbms/src/Flash/Mpp/MPPTask.cpp:365\n       0x1804598\tauto DB::wrapInvocable<std::__1::function<void ()> >(bool, std::__1::function<void ()>&&)::'lambda'()::operator()() [tiflash+25183640]\n                \tdbms/src/Common/wrapInvocable.h:36\n       0x1807d93\tDB::DynamicThreadPool::executeTask(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&) [tiflash+25197971]\n                \tdbms/src/Common/DynamicThreadPool.cpp:101\n       0x18073f0\tDB::DynamicThreadPool::fixedWork(unsigned long) [tiflash+25195504]\n                \tdbms/src/Common/DynamicThreadPool.cpp:115\n       0x18084e2\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(unsigned long), DB::DynamicThreadPool*, unsigned long&>(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void (DB::DynamicThreadPool::*&&)(unsigned long), DB::DynamicThreadPool*&&, unsigned long&)::'lambda'(auto&&...), DB::DynamicThreadPool*, unsigned long> >(void*) [tiflash+25199842]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7effc539bea5\tstart_thread [libpthread.so.0+32421]\n  0x7effc47a096d\t__clone [libc.so.6+1042797]"] [source=MPP<query:440777951262539792,task:2>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [ERROR] [MPPTask.cpp:429] ["task running meets error: Code: 0, e.displayText() = DB::TiFlashException: Different aggregation mode detected, e.what() = DB::TiFlashException, Stack trace:\n\n\n       0x1735c91\tDB::TiFlashException::TiFlashException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::TiFlashError const&) [tiflash+24337553]\n                \tdbms/src/Common/TiFlashException.h:250\n       0x6d996a9\tDB::AggregationInterpreterHelper::isFinalAgg(tipb::Aggregation const&) [tiflash+114923177]\n                \tdbms/src/Flash/Coprocessor/AggregationInterpreterHelper.cpp:61\n       0x6e36218\tDB::PhysicalAggregation::build(DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Logger> const&, tipb::Aggregation const&, DB::FineGrainedShuffle const&, std::__1::shared_ptr<DB::PhysicalPlanNode> const&) [tiflash+115565080]\n                \tdbms/src/Flash/Planner/plans/PhysicalAggregation.cpp:83\n       0x6e284fd\tDB::PhysicalPlan::build(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, tipb::Executor const*) [tiflash+115508477]\n                \tdbms/src/Flash/Planner/PhysicalPlan.cpp:123\n       0x6e2af32\tvoid DB::traverseExecutorTreePostOrder<DB::PhysicalPlan::build(tipb::DAGRequest const*)::$_2>(tipb::Executor const&, DB::PhysicalPlan::build(tipb::DAGRequest const*)::$_2&&)::'lambda'(tipb::Executor const&)::operator()(tipb::Executor const&) const [tiflash+115519282]\n                \tdbms/src/Flash/Statistics/traverseExecutors.h:113\n       0x6e27c0f\tDB::PhysicalPlan::build(tipb::DAGRequest const*) [tiflash+115506191]\n                \tdbms/src/Flash/Planner/PhysicalPlan.cpp:78\n       0x6e25303\tDB::Planner::execute() [tiflash+115495683]\n                \tdbms/src/Flash/Planner/Planner.cpp:36\n       0x6d107f9\tDB::(anonymous namespace)::executeDAG(DB::IQuerySource&, DB::Context&, bool) [tiflash+114362361]\n                \tdbms/src/Flash/executeQuery.cpp:88\n       0x6d102f0\tDB::executeQuery(DB::Context&, bool) [tiflash+114361072]\n                \tdbms/src/Flash/executeQuery.cpp:108\n       0x6df343b\tDB::MPPTask::runImpl() [tiflash+115291195]\n                \tdbms/src/Flash/Mpp/MPPTask.cpp:365\n       0x1804598\tauto DB::wrapInvocable<std::__1::function<void ()> >(bool, std::__1::function<void ()>&&)::'lambda'()::operator()() [tiflash+25183640]\n                \tdbms/src/Common/wrapInvocable.h:36\n       0x1807d93\tDB::DynamicThreadPool::executeTask(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&) [tiflash+25197971]\n                \tdbms/src/Common/DynamicThreadPool.cpp:101\n       0x18073f0\tDB::DynamicThreadPool::fixedWork(unsigned long) [tiflash+25195504]\n                \tdbms/src/Common/DynamicThreadPool.cpp:115\n       0x18084e2\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(unsigned long), DB::DynamicThreadPool*, unsigned long&>(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void (DB::DynamicThreadPool::*&&)(unsigned long), DB::DynamicThreadPool*&&, unsigned long&)::'lambda'(auto&&...), DB::DynamicThreadPool*, unsigned long> >(void*) [tiflash+25199842]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7effc539bea5\tstart_thread [libpthread.so.0+32421]\n  0x7effc47a096d\t__clone [libc.so.6+1042797]"] [source=MPP<query:440777951262539792,task:4>] [thread_id=636]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTaskManager.cpp:152] ["Begin to abort query: 440777951262539792, abort type: ONERROR, reason: From MPP<query:440777951262539792,task:2>: Code: 0, e.displayText() = DB::TiFlashException: Different aggregation mode detected, e.what() = DB::TiFlashException,"] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTaskManager.cpp:152] ["Begin to abort query: 440777951262539792, abort type: ONERROR, reason: From MPP<query:440777951262539792,task:4>: Code: 0, e.displayText() = DB::TiFlashException: Different aggregation mode detected, e.what() = DB::TiFlashException,"] [thread_id=636]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTaskManager.cpp:195] ["Remaining task in query 440777951262539792 are: MPP<query:440777951262539792,task:1> MPP<query:440777951262539792,task:5> MPP<query:440777951262539792,task:4> MPP<query:440777951262539792,task:2> MPP<query:440777951262539792,task:3> "] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:471] ["Begin abort task: MPP<query:440777951262539792,task:1>, abort type: ONERROR"] [source=MPP<query:440777951262539792,task:1>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTaskManager.cpp:167] ["440777951262539792 already in abort process, skip abort"] [thread_id=636]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:500] ["Finish abort task from running"] [source=MPP<query:440777951262539792,task:1>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:471] ["Begin abort task: MPP<query:440777951262539792,task:5>, abort type: ONERROR"] [source=MPP<query:440777951262539792,task:5>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MinTSOScheduler.cpp:84] ["MPP<query:440777951262539792,task:5> is scheduled with miss or abort."] [thread_id=648]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:500] ["Finish abort task from running"] [source=MPP<query:440777951262539792,task:5>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:471] ["Begin abort task: MPP<query:440777951262539792,task:4>, abort type: ONERROR"] [source=MPP<query:440777951262539792,task:4>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:500] ["Finish abort task from running"] [source=MPP<query:440777951262539792,task:4>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:471] ["Begin abort task: MPP<query:440777951262539792,task:2>, abort type: ONERROR"] [source=MPP<query:440777951262539792,task:2>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTask.cpp:500] ["Finish abort task from running"] [source=MPP<query:440777951262539792,task:2>] [thread_id=533]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTaskManager.cpp:152] ["Begin to abort query: 440777951262539792, abort type: ONCANCELLATION, reason: Receive cancel request from TiDB"] [thread_id=415]
[2023/04/14 09:07:25.472 +08:00] [WARN] [MPPTaskManager.cpp:167] ["440777951262539792 already in abort process, skip abort"] [thread_id=415]
[2023/04/14 09:07:25.473 +08:00] [WARN] [MPPTask.cpp:471] ["Begin abort task: MPP<query:440777951262539792,task:3>, abort type: ONERROR"] [source=MPP<query:440777951262539792,task:3>] [thread_id=533]
[2023/04/14 09:07:25.473 +08:00] [WARN] [MPPTask.cpp:500] ["Finish abort task from running"] [source=MPP<query:440777951262539792,task:3>] [thread_id=533]
[2023/04/14 09:07:25.473 +08:00] [WARN] [MPPTaskManager.cpp:207] ["Finish abort query: 440777951262539792"] [thread_id=533]
| username: Billmay表妹 | Original post link

TiDB does not support all DDL statements that MySQL supports. Since DM uses the TiDB parser to process DDL statements, it only supports the DDL syntax that the TiDB parser supports. When you encounter DDL statements that TiDB does not support, you need to manually handle them using dmctl (skip the DDL statements or replace them with specified DDL statements).

| username: wenyi | Original post link

Cousin, where are the DDL statements in here?

| username: TiDBer_UUTlqVvZ | Original post link

Based on the information you provided, this issue is caused by TiFlash. Specifically, the problem arises due to different aggregation modes in TiFlash. TiFlash has two aggregation modes: Hash Aggregation and Stream Aggregation. When TiDB executes a query, if the aggregation mode in TiFlash is inconsistent with TiDB, this error will occur.

To resolve this issue, you can try the following steps:

Confirm whether the aggregation modes in TiDB and TiFlash are consistent. You can check the aggregation modes in TiDB and TiFlash by executing the following commands:

SHOW VARIABLES LIKE 'tidb_distsql_scan_concurrency';
SHOW VARIABLES LIKE 'tidb_opt_broadcast_join';

If the values of these two variables are inconsistent between TiDB and TiFlash, this error will occur. You can resolve the issue by modifying the values of these two variables in TiDB and TiFlash to make them consistent.

| username: polars | Original post link

Thank you for the reply. My TiDB version is 6.5. I checked the current values of these two system variables:

  • tidb_distsql_scan_concurrency is set to 15
  • tidb_opt_broadcast_join does not have a value. I checked the official TiDB documentation, and this version does not have this variable.

Please help confirm again, thank you!

| username: TiDBer_UUTlqVvZ | Original post link

Sorry, these two parameters should not be related, I was mistaken before.

First, you can confirm whether the aggregation mode in TiDB and TiFlash is consistent by executing the following SQL statement:

Use the EXPLAIN command to view the execution plan of the query statement, and then check the type of aggregation operator in the plan to determine the aggregation mode used.

In the execution plan, the operator types for Hash Aggregation and Stream Aggregation modes are HashAgg and StreamAgg, respectively. For example, the following is the execution plan of a query statement using Hash Aggregation:

id   | count | task        | access object | operator info
---- | ----- | -----------| ------------- | -------------
HashAgg_6 | 1 | root        |               | count(1)
└─TableReader_7 | 1 | root | t             | data:TableFullScan_5
  └─TableFullScan_5 | 1 | cop[tikv] | table:t | keep order:false, stats:pseudo

As you can see, this query statement uses the HashAgg operator for aggregation.

The following is the execution plan of a query statement using Stream Aggregation:

id   | count | task        | access object | operator info
---- | ----- | -----------| ------------- | -------------
StreamAgg_6 | 1 | root       |               | count(1)
└─TableReader_7 | 1 | root | t             | data:TableFullScan_5
  └─TableFullScan_5 | 1 | cop[tikv] | table:t | keep order:false, stats:pseudo

As you can see, this query statement uses the StreamAgg operator for aggregation.

Therefore, by checking the operator type in the execution plan, you can determine whether the query statement uses Hash Aggregation or Stream Aggregation.

If there is an inconsistency in the aggregation mode, you can force TiDB to use the same aggregation mode as TiFlash by modifying the tidb_opt_agg_push_down parameter in TiDB. Specifically, you can set this parameter to ON, which enables the aggregation push-down feature, thereby making TiDB use the same aggregation mode as TiFlash.

Related link: 系统变量 | PingCAP 文档中心

| username: polars | Original post link

Thank you for the reply!
Following your suggestion, I tried setting the parameters, but the error still persists. Please help analyze it further, thank you!
show variables like ‘tidb_opt_agg_push_down’;

| username: jansu-dev | Original post link

@flow-PingCAP, could you please take a look at this issue when you have time? I’m not sure if it’s related, thank you. --》 Replace if (unlikely(cond)) throw Exception with RUNTIME_CHECK · Issue #5527 · pingcap/tiflash (github.com)

| username: wenyi | Original post link

The error still exists after changing the tidb_opt_agg_push_down parameter to on, and the error is the same as before.