Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 【十分钟成为 TiFlash Contributor】TiFlash 函数下推必知必会
Author: Huang Haisheng, TiFlash R&D Engineer
Since TiFlash was open-sourced, it has garnered widespread attention from the community. Many enthusiasts have learned about the design principles behind TiFlash through source code reading activities. Additionally, many are eager to contribute to TiFlash, leading to the creation of the “Ten Minutes to Become a TiFlash Contributor” series. We will share everything about TiFlash, from principles to practice!
This article provides detailed information about TiFlash pushdown functions. We have also selected some related issues: https://github.com/pingcap/tiflash/issues/5092. We hope you can complete these challenges after reading this article and earn TiDB Contributor exclusive souvenirs!
Background Knowledge
As an essential part of the TiDB HTAP system, TiFlash receives and executes operators pushed down by TiDB. Sometimes, operators like Projection and Selection contain functions, meaning that to push down these operators, TiFlash must support executing the functions within them.
As shown in the figure above, if an operator contains a function not supported by TiFlash, a series of operators cannot be pushed down to TiFlash for execution. To maximize the parallel computing capabilities of TiFlash MPP, we need TiFlash to support all functions of TiDB. Seemingly trivial function support is a crucial part of TiDB HTAP!
Step-by-Step Guide to Pushdown Functions
1. Confirm the Behavior of the Function to be Pushed Down
The function is pushed down by TiDB to be executed by TiFlash, so the logic executed in TiFlash must be consistent with TiDB, including:
- Main logic
- Return value type
- Exception handling
- etc.
For example, the sqrt
function in TiDB always returns float64
, even if the parameter is of Decimal
type, it will internally evaluate the parameter to evalReal
. In contrast, floor
and ceil
will determine the return value type based on the parameter’s type and size.
Generally, it is relatively simple for TiFlash to be consistent with TiDB. However, for some special inputs, special attention is needed during implementation. For example, what should sqrt
of a negative number return: NaN
, Null
, or an exception?
Therefore, before actual development, it is essential to thoroughly review how TiDB implements this function.
2. Map TiDB Function to TiFlash Function
TiDB identifies functions using tipb::ScalarFuncSig
, while TiFlash uses func_name
as the identifier.
In TiFlash code, we use a mapping table to map tipb::ScalarFuncSig
to func_name
.
The second step in pushing down a new function is to assign a func_name
to the function in TiFlash and add a mapping from tipb::ScalarFuncSig
to func_name
in the corresponding mapping table.
Typically, SQL functions are divided into window function
, aggregate function
, distinct aggregation function
, and scalar function
. TiFlash maintains a mapping table for each type of function, as follows:
window_func_map
- For window functions
agg_func_map
- For regular aggregate functions
distinct_agg_func_map
- For distinct aggregate functions
scalar_func_map
- For general scalar functions
3. Register TiFlash Function
After mapping tipb::ScalarFuncSig
to func_name
, the function pushed down by TiDB will find the corresponding builder in TiFlash based on func_name
. The TiFlash Function will then execute the function logic in the actual execution flow.
Currently, there are two ways to implement Function Builder in TiFlash: reuse function and create function directly.
Reuse Function
Reuse function is used when other functions can be reused. For example, ifNull(arg1, arg2) -> if(isNull(arg1), arg2, arg1)
. Directly writing an ifNull
implementation would be time-consuming, but this method allows reusing other functions’ logic.
In TiFlash, DAGExpressionAnalyzerHelper::function_builder_map
records which functions are reused and how to reuse them.
Add a corresponding DAGExpressionAnalyzerHelper::FunctionBuilder
and add the mapping <func_name, FunctionBuilder>
in DAGExpressionAnalyzerHelper::function_builder_map
.
Refer to other FunctionBuilder
implementations in DAGExpressionAnalyzerHelper
for specific implementation details.
Create Function Directly
Create function directly is used when other functions cannot be reused. Implement the function code under dbms/src/Functions
. Usually, there are certain classifications, such as String-related functions in FunctionString
.
Then call factory.registerFunction
to register the function implementation class in FunctionFactory
. factory.registerFunction
is usually grouped together, so it should be easy to find.
4. Develop Function on TiFlash Side
Next, develop the main body of the function on the TiFlash side. If existing TiFlash functions cannot be reused, you need to inherit the IFunction
interface to develop a function. Fortunately, ClickHouse already has many ready-made functions, but since they may not be compatible with TiDB/MySQL, they are left under Functions for future use.
When inheriting IFunction
to implement a function, first check if there is an existing ClickHouse function with the same semantics under Functions. Modify it to meet TiDB/MySQL compatibility and incorporate it into the TiFlash Function system.
If there is no suitable ClickHouse function, develop a vectorized function from scratch. Although developing vectorized functions is relatively challenging, you can find some patterns and development paradigms from other functions.
TiFlash vs. TiDB
There are differences in vectorized function implementation between TiFlash and TiDB. Contributors who have participated in TiDB contributions should note:
- Differences between C++ and Golang
- TiFlash heavily uses C++ templates, especially for data type-related code.
- Differences in vectorized function systems between TiFlash and TiDB
- The design and usage of expression-related classes differ significantly from TiDB.
IDataType
IColumn
- The combination of parameter Column types (vector and const) grows exponentially. For example, a function with two parameters has four combinations:
- vector, const
- vector, vector
- const, vector
- const, const
- The design and usage of expression-related classes differ significantly from TiDB.
These differences make function development in TiFlash somewhat challenging and quite different from TiDB. Refer to the implementation of other functions in the Function directory, such as FunctionSubStringIndex
. You will have many insights while developing functions
Reference Function Implementations
5. Pushdown Function on TiDB Side
The pushdown function is initiated from the TiDB side, so TiDB also needs some modifications to enable function pushdown. In expression/expression.go
, scalarExprSupportedByFlash
determines which functions can be pushed down to TiFlash. The TiDB planner decides whether an operator can be pushed down to TiFlash based on scalarExprSupportedByFlash
.
For example, to push down the sqrt function to TiFlash, find the scalarExprSupportedByFlash
function in TiDB’s expression/expression.go
. You will see that all functions that can be pushed down are hard-coded into various switch cases. Add the sqrt function to the switch case.
6. Verify Function Pushdown
After completing the development on both TiDB and TiFlash sides, verify the entire pushdown process locally.
Deploy Local Cluster
Method 1: Use TiUP to Deploy Locally Built TiDB and TiFlash Binaries
First, build the TiFlash and TiDB binaries locally, then use TiUP to start a cluster for testing:
tiup playground nightly --db.binpath ${my_tidb} --tiflash.binpath ${my_tiflash}
By default, this will start a cluster with 1 PD, 1 TiKV, 1 TiDB, and 1 TiFlash. The nightly version is the daily build of the master branch. Use db.binpath
and tiflash.binpath
to specify the locally built TiDB and TiFlash. Refer to Quickly Deploy TiDB Cluster Locally for more details.
Method 2: Debug Function Execution Process in IDE and Replace TiDB and TiFlash Using Kill
- First, start a TiDB, TiKV, TiFlash, and PD cluster locally. Follow the official documentation to install TiUP and start the cluster using playground:
tiup playground nightly
By default, this will start a cluster with 1 PD, 1 TiKV, 1 TiDB, and 1 TiFlash. The nightly version is the daily build of the master branch.
-
Then replace with the locally built TiDB and TiFlash
-
TiFlash
ps -ef | grep tiflash
to find the TiFlash process, which should look like this:
xzx 11238 11028 52 20:20 pts/0 00:00:05 /home/xzx/.tiup/components/tiflash/v5.0.0-nightly-20210706/tiflash/tiflash server --config-file=/home/xzx/.tiup/data/ScRdWJM/tiflash-0/tiflash.toml
Note the process ID 11238
and the parameters following TiFlash server --config-file=/home/xzx/.tiup/data/ScRdWJM/tiflash-0/tiflash.toml
.
Then kill 11238
and start the locally built TiFlash using server --config-file=/home/xzx/.tiup/data/ScRdWJM/tiflash-0/tiflash.toml
.
- TiDB
Similar to TiFlash, find the TiUP TiDB process, kill the original process, and start TiDB with the corresponding parameters.
Verify Pushdown Process
Use queries like explain select sum(sqrt(x)) from test
to see if the function is pushed down to TiFlash for computation.
Create TiFlash replica:
create table test.t (xxx);
-- Since usually only one node is started locally, set TiFlash replica to 1
alter table test.t set tiflash replica 1;
Test SQL can be like this:
-- Prefer MPP
set tidb_enforce_mpp=1;
-- Force to use only TiFlash
set tidb_isolation_read_engines='tiflash';
explain select xxxfunc(a) from t;
If the function is pushed down to TiFlash, the explain result will show the Projection operator containing the function on the TiFlash side. Execute the explain SQL multiple times as TiFlash replica creation takes some time, but not too long. If the function is not pushed down after a long time, there might be an issue.
After the explain SQL executes successfully, remove the explain and execute the SQL to see the effect.
7. Testing
After submitting the PR, the GitHub CI for TiFlash will start an actual TiDB, TiFlash, PD, and TiKV cluster to automatically run unit and integration tests. Contributors need to prepare the test code in advance.
Integration Testing
For function pushdown, usually add a set of tests in the integration-test. Create a func.test
for the new pushdown function under tests/fullstack-test/expr
, referring to other function tests in the same directory, such as substring_index.test
.
Unit Testing
Format
TiFlash function unit tests are placed under dbms/src/Functions/test
. The naming format is usually gtest_${func_name}.cpp
.
The unit test template is as follows:
#include <TestUtils/FunctionTestUtils.h>
#include <TestUtils/TiFlashTestBasic.h>
namespace DB::tests
{
class {gtest_name} : public DB::tests::FunctionTest
{
};
TEST_F({gtest_name}, {gtest_unit_name})
try
{
const String & func_name = {function_name};
// case1
ASSERT_COLUMN_EQ(
{ouput_result},
executeFunction(
func_name,
{input_1},
{input_2},
...,
{input_n},);
// case2
...
// case3
...
}
CATCH
TEST_F({gtest_name}, {gtest_unit_name2})...
TEST_F({gtest_name}, {gtest_unit_name3})...
...
} // namespace DB::tests
Refer to other function unit tests in the directory and make appropriate adjustments.
FunctionTestUtils
is a common class for function testing, providing various commonly used methods such as CreateColumn
. If you find other reusable methods while writing gtests, you can add them here.
Content
For a function like function(arg_1, arg_2, arg_3, … arg_n), a TiFlash function unit test should at least include the following parts:
Data Types
For each arg_i’s supported types, test Type and Nullable(Type). Although theoretically, all arg_i should support DataTypeNullable(DataTypeNothing), TiDB rarely uses DataTypeNullable(DataTypeNothing), so note related bugs if encountered.
Column Types
For each arg_i’s type:
-
If the type is not nullable, test two forms of columns:
-
ColumnVector
-
ColumnConst
-
If the type is nullable, test three forms of columns:
-
ColumnVector
-
ColumnConst(ColumnNullable(non-null value))
-
ColumnConst(ColumnNullable(null value))
-
If the type is DataTypeNullable(DataTypeNothing), test two forms of columns:
-
ColumnVector
-
ColumnConst(ColumnNullable(null value))
Boundary Values
Some common boundary value examples are:
- Numeric types (int, double, decimal, etc.): max/min values, 0 value, null value
- String types: empty string, non-ASCII characters like Chinese, null value, with/without collation
- Date types: zero date, dates before 1970-01-01, daylight saving time, null value
For specific functions, construct boundary values based on their specific implementation.
Return Value Types
Ensure TiFlash function return value types are consistent with MySQL/TiDB according to MySQL documentation.
Note:
- Decimal types in TiFlash have four internal representations: Decimal32, Decimal64, Decimal128, and Decimal256. Test all four for all Decimal types.
- The possible types for each arg_i should be based on the types TiDB might push down. Considering the difficulty of obtaining this information, write tests based on the types currently supported by TiFlash.
- Some TiDB pushdown functions have function signatures containing type information, such as EQInt, EQReal, EQString, EQDecimal, EQTime, EQDuration, EQJson for a = b. Although a and b can be int/real/string/decimal/time/duration/json, TiDB ensures a and b are of the same type when pushing down. For now, only test equal functions for the same type, like int = int, decimal = decimal.
- For functions with potentially infinite input parameters (e.g., case when), ensure the minimum loop unit is tested.
- Expect to find many bugs during testing. Fix easy-to-fix bugs while testing. For difficult or uncertain bugs, open an issue and comment out the corresponding test.
Common Issues
- Even if a function returns null, assign a meaningful value to its corresponding nestedColumn
In TiFlash function implementations, there is an overloadable function: useDefaultImplementationForNulls. For most functions, if no special handling for null is needed, return true. This way, no null-related considerations are needed when implementing the function. The principle is that IExecutableFunction::defaultImplementationForNulls will extract the nestedColumn of the nullable column and pass it to the function, and the nestedColumn is always of a not-null type.
For functions requiring special null handling, like concat_ws, which needs