[Practical Guide] Quickly Adding a New Feature to TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【实用指南】快速给 TiDB 新增一个功能

| username: TiDB社区小助手

Author: Chen Shuang

PS: For those participating in the TiDB product group and adding new features to TiDB components, you can take a look.

TiDB Hackathon 2022 is coming soon. This article introduces, step-by-step, how to quickly add a new feature to TiDB, allowing people without much background knowledge to quickly add a new feature to TiDB.

Suppose we want to import an SST file into TiDB by adding the LOAD SST FILE <file_path> syntax.

When TiDB receives an SQL request, the general execution process is to generate an AST syntax tree → generate an execution plan → construct an Executor and execute it. Let’s first implement the syntax.

Syntax Implementation

How do we implement the syntax? We can follow the example of a similar LOAD DATA syntax and start from there.

Step-1: Add AST Syntax Tree

The LOAD DATA syntax is represented by ast.LoadDataStmt. Similarly, we add a LoadSSTFileStmt AST syntax tree in tidb/parser/ast/dml.go:

// LoadSSTFileStmt is a statement to load sst file.
type LoadSSTFileStmt struct {
   dmlNode

   Path string
}

// Restore implements Node interface.
func (n *LoadSSTFileStmt) Restore(ctx *format.RestoreCtx) error {
   ctx.WriteKeyWord("LOAD SST FILE ")
   ctx.WriteString(n.Path)
   return nil
}

// Accept implements Node Accept interface.
func (n *LoadSSTFileStmt) Accept(v Visitor) (Node, bool) {
   newNode, _ := v.Enter(n)
   return v.Leave(newNode)
}

The Restore method is used to restore the corresponding SQL statement from the AST syntax tree. The Accept method facilitates other tools to traverse this AST syntax tree. For example, TiDB uses the Accept method of the AST syntax tree to traverse all nodes in the AST syntax tree during preprocessing.

Step-2: Add Syntax

The LOAD DATA syntax is implemented through LoadDataStmt. Similarly, we add the LoadSSTFileStmt syntax in tidb/parser/parser.y. Several places need to be modified, as shown in the git diff below:

diff --git a/parser/parser.y b/parser/parser.y
index 1539bb13db..079859e8a9 100644
--- a/parser/parser.y
+++ b/parser/parser.y
@@ -243,6 +243,7 @@ import (
        sqlCalcFoundRows  "SQL_CALC_FOUND_ROWS"
        sqlSmallResult    "SQL_SMALL_RESULT"
        ssl               "SSL"
+       sst               "SST"
        starting          "STARTING"
        statsExtended     "STATS_EXTENDED"
        straightJoin      "STRAIGHT_JOIN"
@@ -908,6 +909,7 @@ import (
        IndexAdviseStmt            "INDEX ADVISE statement"
        KillStmt                   "Kill statement"
        LoadDataStmt               "Load data statement"
+       LoadSSTFileStmt            "Load sst file statement"
        LoadStatsStmt              "Load statistic statement"
        LockTablesStmt             "Lock tables statement"
        NonTransactionalDeleteStmt "Non-transactional delete statement"
@@ -11324,6 +11326,7 @@ Statement:
 |      IndexAdviseStmt
 |      KillStmt
 |      LoadDataStmt
+|      LoadSSTFileStmt
 |      LoadStatsStmt
 |      PlanReplayerStmt
 |      PreparedStmt
@@ -13496,6 +13499,14 @@ LoadDataStmt:
                $ = x
        }

+LoadSSTFileStmt:
+       "LOAD" "SST" "FILE" stringLit
+       {
+               $ = &ast.LoadSSTFileStmt{
+                       Path: $4,
+               }
+       }
+

In the above modifications:

  • Line 9 registers a new keyword SST because it is a new keyword in the syntax.
  • Lines 17 and 25 register a new syntax called LoadSSTFileStmt.
  • Lines 33-40 define the LoadSSTFileStmt syntax structure as LOAD SST FILE <file_path>. The first three keywords are fixed, so we directly define "LOAD" "SST" "FILE". The fourth is the file path, a variable value. We use stringLit to extract this variable value and then use this value to initialize ast.LoadSSTFileStmt, where $4 refers to the value of the fourth variable stringLit.

Since a new keyword SST is introduced, we also need to add this keyword in tidb/parser/misc.go:

diff --git a/parser/misc.go b/parser/misc.go
index 140619bb07..418e9dd6a4 100644
--- a/parser/misc.go
+++ b/parser/misc.go
@@ -669,6 +669,7 @@ var tokenMap = map[string]int{
        "SQL_TSI_YEAR":             sqlTsiYear,
        "SQL":                      sql,
        "SSL":                      ssl,
+       "SST":                      sst,
        "STALENESS":                staleness,
        "START":                    start,
        "STARTING":                 starting,

Step-3: Compile and Test

Compile to generate the new parser files.

cd parser
make fmt  # Format the code
make      # Compile to generate the new parser files

We can add a test in the tidb/parser/parser_test.go file’s TestDMLStmt to verify that our new syntax works. Below is the git diff showing the modifications:

diff --git a/parser/parser_test.go b/parser/parser_test.go
index 7093c3889f..d2c75c4c59 100644
--- a/parser/parser_test.go
+++ b/parser/parser_test.go
@@ -666,6 +666,9 @@ func TestDMLStmt(t *testing.T) {
                {"LOAD DATA LOCAL INFILE '/tmp/t.csv' IGNORE INTO TABLE t1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\
';", true, "LOAD DATA LOCAL INFILE '/tmp/t.csv' IGNORE INTO TABLE `t1` FIELDS TERMINATED BY ','"},
                {"LOAD DATA LOCAL INFILE '/tmp/t.csv' REPLACE INTO TABLE t1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\
';", true, "LOAD DATA LOCAL INFILE '/tmp/t.csv' REPLACE INTO TABLE `t1` FIELDS TERMINATED BY ','"},

+               // load sst file test
+               {"load sst file 'table0.sst'", true, "LOAD SST FILE 'table0.sst'"},
+

Then run the tests:

cd parser
make test # Run all parser tests. For quick verification, use the command go test -run="TestDMLStmt" to run only the modified TestDMLStmt test

Generate Execution Plan

After TiDB generates the AST syntax tree, it needs to generate the corresponding execution plan. We need to define the execution plan for LOAD SST FILE. Similarly, we find the LOAD DATA execution plan LoadData in tidb/planner/core/common_plans.go and define the LoadSSTFile execution plan:

// LoadSSTFile represents a load sst file plan.
type LoadSSTFile struct {
        baseSchemaProducer

        Path        string
}

To allow TiDB to generate the corresponding LoadSSTFile execution plan based on the ast.LoadSSTFileStmt syntax tree, we need to implement our buildLoadSSTFile method in the tidb/planner/core/planbuilder.go file, referring to the buildLoadData method. Below is the git diff showing the modifications:

diff --git a/planner/core/planbuilder.go b/planner/core/planbuilder.go
index ad7ce64748..c68e992b35 100644
--- a/planner/core/planbuilder.go
+++ b/planner/core/planbuilder.go
@@ -734,6 +734,8 @@ func (b *PlanBuilder) Build(ctx context.Context, node ast.Node) (Plan, error) {
                return b.buildInsert(ctx, x)
        case *ast.LoadDataStmt:
                return b.buildLoadData(ctx, x)
+       case *ast.LoadSSTFileStmt:
+               return b.buildLoadSSTFile(x)
@@ -3979,6 +3981,13 @@ func (b *PlanBuilder) buildLoadData(ctx context.Context, ld *ast.LoadDataStmt) (
        return p, nil
 }

+func (b *PlanBuilder) buildLoadSSTFile(ld *ast.LoadSSTFileStmt) (Plan, error) {
+       p := &LoadSSTFile{
+               Path: ld.Path,
+       }
+       return p, nil
+}
+

Construct Executor and Execute

After generating the execution plan, we need to construct the corresponding Executor and then execute it. TiDB uses the Volcano execution engine. You can place related initialization work in the Open method, implement the main functionality in the Next method, and perform cleanup and resource release operations in the Close method after execution.

We need to define the LOAD SST FILE Executor and let it implement the executor.Executor interface. You can place the related definitions in the tidb/executor/executor.go file:

// LoadSSTFileExec represents a load sst file executor.
type LoadSSTFileExec struct {
   baseExecutor

   path string
   done bool
}

// Open implements the Executor Open interface.
func (e *LoadSSTFileExec) Open(ctx context.Context) error {
   logutil.BgLogger().Warn("----- load sst file open, you can initialize some resource here")
   return nil
}

// Next implements the Executor Next interface.
func (e *LoadSSTFileExec) Next(ctx context.Context, req *chunk.Chunk) error {
   req.Reset()
   if e.done {
      return nil
   }
   e.done = true

   logutil.BgLogger().Warn("----- load sst file exec", zap.String("file", e.path))
   return nil
}

// Close implements the Executor Close interface.
func (e *LoadSSTFileExec) Close() error {
   logutil.BgLogger().Warn("----- load sst file close, you can release some resource here")
   return nil
}

If there is no initialization and cleanup work, you can skip implementing the Open and Close methods because baseExecutor has already implemented them.

To allow TiDB to generate the LoadSSTFileExec Executor based on the LoadSSTFile execution plan, we need to modify the tidb/executor/builder.go file. Below is the git diff showing the modifications:

diff --git a/executor/builder.go b/executor/builder.go
index 1154633bd5..4f0478daa6 100644
--- a/executor/builder.go
+++ b/executor/builder.go
@@ -199,6 +199,8 @@ func (b *executorBuilder) build(p plannercore.Plan) Executor {
                return b.buildInsert(v)
        case *plannercore.LoadData:
                return b.buildLoadData(v)
+       case *plannercore.LoadSSTFile:
+               return b.buildLoadSSTFile(v)
        case *plannercore.LoadStats:
                return b.buildLoadStats(v)
        case *plannercore.IndexAdvise:
@@ -944,6 +946,14 @@ func (b *executorBuilder) buildLoadData(v *plannercore.LoadData) Executor {
        return loadDataExec
 }

+func (b *executorBuilder) buildLoadSSTFile(v *plannercore.LoadSSTFile) Executor {
+       e := &LoadSSTFileExec{
+               baseExecutor: newBaseExecutor(b.ctx, nil, v.ID()),
+               path:         v.Path,
+       }
+       return e
+}
+

Verification

At this point, we have successfully added a “feature” to TiDB. We can compile TiDB and start it to verify:

make    # Compile TiDB server
bin/tidb-server  # Start a TiDB server

Then open a new terminal and connect using the MySQL client to test the new feature:

▶ mysql -u root -h 127.0.0.1 -P 4000

mysql> load sst file 'table0.sst';
Query OK, 0 rows affected (0.00 sec)

You can see that the execution was successful, and in the tidb-server output logs, you can see the log output of the Executor execution for this feature:

[2022/09/19 15:24:02.745 +08:00] [WARN] [executor.go:2213] ["----- load sst file open, you can initialize some resource here"]
[2022/09/19 15:24:02.745 +08:00] [WARN] [executor.go:2225] ["----- load sst file exec"] [file=table0.sst]
[2022/09/19 15:24:02.745 +08:00] [WARN] [executor.go:2231] ["----- load sst file close, you can release some resource here"]

Summary

The code example for this article can be found here. You can check it out yourself.

This article teaches you how to add a new feature to TiDB by “following the example,” but it also omits some details, such as permission checks and adding comprehensive tests. I hope it helps the readers. If you want to learn more background knowledge and details, I recommend reading the TiDB Development Guide and the TiDB Source Code Reading Blog.

| username: BraveChen | Original post link

Haha, boss

| username: Fly-bird | Original post link

LOAD SST FILE <file_path> This is awesome

| username: TiDBer_小阿飞 | Original post link

I’m getting old; it took me an hour to read your article.