Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 【实用指南】快速给 TiDB 新增一个功能
Author: Chen Shuang
PS: For those participating in the TiDB product group and adding new features to TiDB components, you can take a look.
TiDB Hackathon 2022 is coming soon. This article introduces, step-by-step, how to quickly add a new feature to TiDB, allowing people without much background knowledge to quickly add a new feature to TiDB.
Suppose we want to import an SST file into TiDB by adding the LOAD SST FILE <file_path>
syntax.
When TiDB receives an SQL request, the general execution process is to generate an AST syntax tree → generate an execution plan → construct an Executor and execute it. Let’s first implement the syntax.
Syntax Implementation
How do we implement the syntax? We can follow the example of a similar LOAD DATA
syntax and start from there.
Step-1: Add AST Syntax Tree
The LOAD DATA
syntax is represented by ast.LoadDataStmt
. Similarly, we add a LoadSSTFileStmt
AST syntax tree in tidb/parser/ast/dml.go
:
// LoadSSTFileStmt is a statement to load sst file.
type LoadSSTFileStmt struct {
dmlNode
Path string
}
// Restore implements Node interface.
func (n *LoadSSTFileStmt) Restore(ctx *format.RestoreCtx) error {
ctx.WriteKeyWord("LOAD SST FILE ")
ctx.WriteString(n.Path)
return nil
}
// Accept implements Node Accept interface.
func (n *LoadSSTFileStmt) Accept(v Visitor) (Node, bool) {
newNode, _ := v.Enter(n)
return v.Leave(newNode)
}
The Restore
method is used to restore the corresponding SQL statement from the AST syntax tree. The Accept
method facilitates other tools to traverse this AST syntax tree. For example, TiDB uses the Accept
method of the AST syntax tree to traverse all nodes in the AST syntax tree during preprocessing.
Step-2: Add Syntax
The LOAD DATA
syntax is implemented through LoadDataStmt
. Similarly, we add the LoadSSTFileStmt
syntax in tidb/parser/parser.y
. Several places need to be modified, as shown in the git diff below:
diff --git a/parser/parser.y b/parser/parser.y
index 1539bb13db..079859e8a9 100644
--- a/parser/parser.y
+++ b/parser/parser.y
@@ -243,6 +243,7 @@ import (
sqlCalcFoundRows "SQL_CALC_FOUND_ROWS"
sqlSmallResult "SQL_SMALL_RESULT"
ssl "SSL"
+ sst "SST"
starting "STARTING"
statsExtended "STATS_EXTENDED"
straightJoin "STRAIGHT_JOIN"
@@ -908,6 +909,7 @@ import (
IndexAdviseStmt "INDEX ADVISE statement"
KillStmt "Kill statement"
LoadDataStmt "Load data statement"
+ LoadSSTFileStmt "Load sst file statement"
LoadStatsStmt "Load statistic statement"
LockTablesStmt "Lock tables statement"
NonTransactionalDeleteStmt "Non-transactional delete statement"
@@ -11324,6 +11326,7 @@ Statement:
| IndexAdviseStmt
| KillStmt
| LoadDataStmt
+| LoadSSTFileStmt
| LoadStatsStmt
| PlanReplayerStmt
| PreparedStmt
@@ -13496,6 +13499,14 @@ LoadDataStmt:
$ = x
}
+LoadSSTFileStmt:
+ "LOAD" "SST" "FILE" stringLit
+ {
+ $ = &ast.LoadSSTFileStmt{
+ Path: $4,
+ }
+ }
+
In the above modifications:
- Line 9 registers a new keyword
SST
because it is a new keyword in the syntax. - Lines 17 and 25 register a new syntax called
LoadSSTFileStmt
. - Lines 33-40 define the
LoadSSTFileStmt
syntax structure asLOAD SST FILE <file_path>
. The first three keywords are fixed, so we directly define"LOAD" "SST" "FILE"
. The fourth is the file path, a variable value. We usestringLit
to extract this variable value and then use this value to initializeast.LoadSSTFileStmt
, where$4
refers to the value of the fourth variablestringLit
.
Since a new keyword SST
is introduced, we also need to add this keyword in tidb/parser/misc.go
:
diff --git a/parser/misc.go b/parser/misc.go
index 140619bb07..418e9dd6a4 100644
--- a/parser/misc.go
+++ b/parser/misc.go
@@ -669,6 +669,7 @@ var tokenMap = map[string]int{
"SQL_TSI_YEAR": sqlTsiYear,
"SQL": sql,
"SSL": ssl,
+ "SST": sst,
"STALENESS": staleness,
"START": start,
"STARTING": starting,
Step-3: Compile and Test
Compile to generate the new parser
files.
cd parser
make fmt # Format the code
make # Compile to generate the new parser files
We can add a test in the tidb/parser/parser_test.go
file’s TestDMLStmt
to verify that our new syntax works. Below is the git diff showing the modifications:
diff --git a/parser/parser_test.go b/parser/parser_test.go
index 7093c3889f..d2c75c4c59 100644
--- a/parser/parser_test.go
+++ b/parser/parser_test.go
@@ -666,6 +666,9 @@ func TestDMLStmt(t *testing.T) {
{"LOAD DATA LOCAL INFILE '/tmp/t.csv' IGNORE INTO TABLE t1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\
';", true, "LOAD DATA LOCAL INFILE '/tmp/t.csv' IGNORE INTO TABLE `t1` FIELDS TERMINATED BY ','"},
{"LOAD DATA LOCAL INFILE '/tmp/t.csv' REPLACE INTO TABLE t1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\
';", true, "LOAD DATA LOCAL INFILE '/tmp/t.csv' REPLACE INTO TABLE `t1` FIELDS TERMINATED BY ','"},
+ // load sst file test
+ {"load sst file 'table0.sst'", true, "LOAD SST FILE 'table0.sst'"},
+
Then run the tests:
cd parser
make test # Run all parser tests. For quick verification, use the command go test -run="TestDMLStmt" to run only the modified TestDMLStmt test
Generate Execution Plan
After TiDB generates the AST syntax tree, it needs to generate the corresponding execution plan. We need to define the execution plan for LOAD SST FILE
. Similarly, we find the LOAD DATA
execution plan LoadData
in tidb/planner/core/common_plans.go
and define the LoadSSTFile
execution plan:
// LoadSSTFile represents a load sst file plan.
type LoadSSTFile struct {
baseSchemaProducer
Path string
}
To allow TiDB to generate the corresponding LoadSSTFile
execution plan based on the ast.LoadSSTFileStmt
syntax tree, we need to implement our buildLoadSSTFile
method in the tidb/planner/core/planbuilder.go
file, referring to the buildLoadData
method. Below is the git diff showing the modifications:
diff --git a/planner/core/planbuilder.go b/planner/core/planbuilder.go
index ad7ce64748..c68e992b35 100644
--- a/planner/core/planbuilder.go
+++ b/planner/core/planbuilder.go
@@ -734,6 +734,8 @@ func (b *PlanBuilder) Build(ctx context.Context, node ast.Node) (Plan, error) {
return b.buildInsert(ctx, x)
case *ast.LoadDataStmt:
return b.buildLoadData(ctx, x)
+ case *ast.LoadSSTFileStmt:
+ return b.buildLoadSSTFile(x)
@@ -3979,6 +3981,13 @@ func (b *PlanBuilder) buildLoadData(ctx context.Context, ld *ast.LoadDataStmt) (
return p, nil
}
+func (b *PlanBuilder) buildLoadSSTFile(ld *ast.LoadSSTFileStmt) (Plan, error) {
+ p := &LoadSSTFile{
+ Path: ld.Path,
+ }
+ return p, nil
+}
+
Construct Executor and Execute
After generating the execution plan, we need to construct the corresponding Executor and then execute it. TiDB uses the Volcano execution engine. You can place related initialization work in the Open
method, implement the main functionality in the Next
method, and perform cleanup and resource release operations in the Close
method after execution.
We need to define the LOAD SST FILE
Executor and let it implement the executor.Executor
interface. You can place the related definitions in the tidb/executor/executor.go
file:
// LoadSSTFileExec represents a load sst file executor.
type LoadSSTFileExec struct {
baseExecutor
path string
done bool
}
// Open implements the Executor Open interface.
func (e *LoadSSTFileExec) Open(ctx context.Context) error {
logutil.BgLogger().Warn("----- load sst file open, you can initialize some resource here")
return nil
}
// Next implements the Executor Next interface.
func (e *LoadSSTFileExec) Next(ctx context.Context, req *chunk.Chunk) error {
req.Reset()
if e.done {
return nil
}
e.done = true
logutil.BgLogger().Warn("----- load sst file exec", zap.String("file", e.path))
return nil
}
// Close implements the Executor Close interface.
func (e *LoadSSTFileExec) Close() error {
logutil.BgLogger().Warn("----- load sst file close, you can release some resource here")
return nil
}
If there is no initialization and cleanup work, you can skip implementing the Open
and Close
methods because baseExecutor
has already implemented them.
To allow TiDB to generate the LoadSSTFileExec
Executor based on the LoadSSTFile
execution plan, we need to modify the tidb/executor/builder.go
file. Below is the git diff showing the modifications:
diff --git a/executor/builder.go b/executor/builder.go
index 1154633bd5..4f0478daa6 100644
--- a/executor/builder.go
+++ b/executor/builder.go
@@ -199,6 +199,8 @@ func (b *executorBuilder) build(p plannercore.Plan) Executor {
return b.buildInsert(v)
case *plannercore.LoadData:
return b.buildLoadData(v)
+ case *plannercore.LoadSSTFile:
+ return b.buildLoadSSTFile(v)
case *plannercore.LoadStats:
return b.buildLoadStats(v)
case *plannercore.IndexAdvise:
@@ -944,6 +946,14 @@ func (b *executorBuilder) buildLoadData(v *plannercore.LoadData) Executor {
return loadDataExec
}
+func (b *executorBuilder) buildLoadSSTFile(v *plannercore.LoadSSTFile) Executor {
+ e := &LoadSSTFileExec{
+ baseExecutor: newBaseExecutor(b.ctx, nil, v.ID()),
+ path: v.Path,
+ }
+ return e
+}
+
Verification
At this point, we have successfully added a “feature” to TiDB. We can compile TiDB and start it to verify:
make # Compile TiDB server
bin/tidb-server # Start a TiDB server
Then open a new terminal and connect using the MySQL client to test the new feature:
▶ mysql -u root -h 127.0.0.1 -P 4000
mysql> load sst file 'table0.sst';
Query OK, 0 rows affected (0.00 sec)
You can see that the execution was successful, and in the tidb-server output logs, you can see the log output of the Executor execution for this feature:
[2022/09/19 15:24:02.745 +08:00] [WARN] [executor.go:2213] ["----- load sst file open, you can initialize some resource here"]
[2022/09/19 15:24:02.745 +08:00] [WARN] [executor.go:2225] ["----- load sst file exec"] [file=table0.sst]
[2022/09/19 15:24:02.745 +08:00] [WARN] [executor.go:2231] ["----- load sst file close, you can release some resource here"]
Summary
The code example for this article can be found here. You can check it out yourself.
This article teaches you how to add a new feature to TiDB by “following the example,” but it also omits some details, such as permission checks and adding comprehensive tests. I hope it helps the readers. If you want to learn more background knowledge and details, I recommend reading the TiDB Development Guide and the TiDB Source Code Reading Blog.