TiDB 7.0 is here! Come and see if there are any important new features for you in 7.0!

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 7.0 来咯!快来看看 7.0 有没有对你来说很重要的新特性!

| username: Billmay表妹

Release Date: March 30, 2023

TiDB Version: 7.0.0

Trial Links: Quick Experience | Download Offline Package

In version 7.0.0, you can get the following key features:

Category Feature Description
Scalability and Performance Session-level SQL execution plan cache without manual preparation (Experimental) Supports automatic reuse of execution plan cache at the session level, reducing compilation and shortening the time for the same SQL query without manually preparing the Prepare Statement.
Scalability and Performance TiFlash supports storage-compute separation and S3 shared storage (Experimental) TiFlash adds support for cloud-native architecture as an option:
* Supports storage-compute separation architecture, enhancing HTAP resource elasticity.
* Supports S3-based storage engine, providing shared storage at a lower cost.
Stability and High Availability Enhanced resource control (Experimental) Supports using resource groups to allocate and isolate resources for different applications or workloads within a cluster. This version adds support for binding modes (user-level, session-level, statement-level) and user-defined priorities. You can also use commands to estimate the overall resource amount of the cluster.
Stability and High Availability TiFlash supports data spill to disk TiFlash supports spilling intermediate results to disk to alleviate OOM issues in data-intensive operations (e.g., aggregation, sorting, and Hash Join).
SQL Row-level TTL (GA) Supports automatically deleting data that exceeds its lifecycle (Time to live) through background tasks, thereby automatically managing data scale and improving performance.
SQL Supports REORGANIZE PARTITION syntax (List/Range partitioned tables) The REORGANIZE PARTITION statement can be used to merge adjacent partitions or split one partition into multiple partitions, improving the usability of partitioned tables.
Database Management and Observability TiDB integrates TiDB Lightning through the LOAD DATA statement (Experimental) Integrating TiDB Lightning’s logical import mode makes the LOAD DATA statement more powerful, such as supporting data import from S3/GCS and task management.
Database Management and Observability TiCDC supports object storage Sink (GA) TiCDC supports synchronizing row change events to object storage services, including Amazon S3, GCS, Azure Blob Storage, and NFS.

Feature Details

Scalability

  • TiFlash supports storage-compute separation and object storage (Experimental) #6882 @flowbehappy @JaySon-Huang @breezewish @JinheLin @lidezhu @CalvinNeo Before version v7.0.0, TiFlash had a storage-compute integrated architecture. In this architecture, TiFlash nodes were both storage and compute nodes, and their compute and storage capabilities could not be independently scaled. Additionally, TiFlash nodes could only use local storage. Starting from v7.0.0, TiFlash introduces a storage-compute separation architecture. In this architecture, TiFlash nodes are divided into Compute Nodes and Write Nodes, and support S3 API-compatible object storage. These nodes can be independently scaled, adjusting compute or data storage capabilities separately. The storage-compute separation architecture and the integrated architecture cannot be mixed or converted; you need to specify the architecture during TiFlash deployment. For more information, please refer to the user documentation.

Performance

  • Implement compatibility for Fast Online DDL and PITR #38045 @Leavrth In TiDB v6.5.0, the Fast Online DDL feature and PITR were not fully compatible. When using TiDB v6.5.0, it is recommended to stop the PITR background backup task first, quickly add indexes using Fast Online DDL, and then restart the PITR backup task for full data backup. Starting from TiDB v7.0.0, Fast Online DDL and PITR are fully compatible. When restoring cluster data through PITR, the system will automatically replay the index addition operations recorded during the log backup period using Fast Online DDL, achieving compatibility. For more information, please refer to the user documentation.
  • TiFlash engine supports Null-Aware Semi Join and Null-Aware Anti Semi Join operators #6674 @gengliqi When using IN, NOT IN, = ANY, or != ALL operators to guide correlated subqueries, TiDB will convert them to Semi Join or Anti Semi Join to improve computation performance. If the converted Join key column may be NULL, a Null-Aware Join algorithm is needed, i.e., Null-Aware Semi Join and Null-Aware Anti Semi Join operators. In versions before v7.0.0, the TiFlash engine did not support Null-Aware Semi Join and Null-Aware Anti Semi Join operators, so these subqueries could not be directly pushed down to the TiFlash engine for computation. Starting from TiDB v7.0.0, the TiFlash engine supports Null-Aware Semi Join and Null-Aware Anti Semi Join operators. If the SQL contains these correlated subqueries, the table has TiFlash replicas, and MPP mode is enabled, the optimizer will automatically determine whether to push down the Null-Aware Semi Join and Null-Aware Anti Semi Join operators to the TiFlash engine for computation to improve overall performance. For more information, please refer to the user documentation.
  • TiFlash engine supports FastScan feature (GA) #5252 @hongyunyan Starting from v6.3.0, the TiFlash engine introduced the FastScan feature as an experimental feature. In v7.0.0, this feature is officially GA. You can enable the FastScan feature using the system variable tiflash_fastscan. By sacrificing strong consistency guarantees, this feature can significantly improve table scan performance. If the corresponding table only performs INSERT operations without UPDATE/DELETE operations, the FastScan feature can improve table scan performance without losing strong consistency. For more information, please refer to the user documentation.
  • TiFlash queries support delayed materialization (Experimental) #5829 @Lloyd-Pottiger When the SELECT statement contains filter conditions (WHERE clause), TiFlash by default reads all the data of the required columns for the query, and then performs filtering, aggregation, and other computation tasks based on the query conditions. Delayed materialization is an optimization method that supports pushing down part of the filter conditions to the TableScan operator, i.e., first scanning the column data related to the filter conditions, filtering out the rows that meet the conditions, and then scanning the other column data of these rows for further computation, thereby reducing IO scans and data processing computation. TiFlash delayed materialization is disabled by default and can be enabled by setting the system variable tidb_opt_enable_late_materialization to ON. Once enabled, the TiDB optimizer will decide which filter conditions will be pushed down to the TableScan operator based on statistics and query filter conditions. For more information, please refer to the user documentation.
  • Supports caching execution plans for non-Prepare statements (Experimental) #36598 @qw4990 Execution plan caching is an important means to improve concurrent OLTP load capacity. TiDB already supports plan caching for Prepare statements. In v7.0.0, execution plans for non-Prepare statements can also be cached, making execution plan caching applicable to a wider range of scenarios, thereby improving TiDB’s concurrent processing capacity. This feature is currently disabled by default and can be enabled through the system variable tidb_enable_non_prepared_plan_cache. For stability reasons, in the current version, TiDB opens a new area to cache execution plans for non-Prepare statements, and you can set the cache size through the system variable tidb_non_prepared_plan_cache_size. Additionally, this feature has certain limitations on the SQL mode, as detailed in the usage restrictions. For more information, please refer to the user documentation.
  • Removes the restriction on subqueries for execution plan caching #40219 @fzzf678 TiDB v7.0.0 removes the restriction on subqueries for plan caching. Execution plans for SQL statements with subqueries can be cached, such as SELECT * FROM t WHERE a > (SELECT ...). This further expands the application scope of execution plan caching, improving SQL execution efficiency. For more information, please refer to the user documentation.
  • TiKV supports automatically generating empty log files for log recycling #14371 @LykxSassinator TiKV introduced the Raft log recycling feature in v6.3.0 to reduce long-tail latency of write loads. However, log recycling requires a certain threshold of Raft log files to be reached before it can take effect, making it difficult for users to intuitively feel the improvement in write load throughput brought by this feature. To improve user experience, v7.0.0 officially introduces the raft-engine.prefill-for-recycle configuration item to control whether TiKV automatically generates empty log files for log recycling at process startup. When this configuration item is enabled, TiKV will automatically fill a batch of empty log files for log recycling during initialization, ensuring that log recycling takes effect immediately after initialization. For more information, please refer to the user documentation.
  • Supports deriving TopN or Limit from window functions optimization rules to improve window function performance #13936 @windtalker This feature is disabled by default and needs to be enabled by setting the session variable tidb_opt_derive_topn to ON. For more information, please refer to the user documentation.
  • Supports creating unique indexes through Fast Online DDL #40730 @tangenta TiDB v6.5.0 supports creating ordinary secondary indexes through Fast Online DDL. Starting from v7.0.0, TiDB supports creating unique indexes through Fast Online DDL. Compared to TiDB v6.1.0, the performance of adding unique indexes to large tables is expected to improve several times. For more information, please refer to the user documentation.

Stability

  • Enhanced resource control feature (Experimental) #38825 @nolouch @BornChanger @glorv @tiancaiamao @Connor1996 @JmPotato @hnes @CabinfeverB @HuSharp TiDB has optimized the resource control feature based on resource groups. This feature will greatly improve the resource utilization efficiency and performance of the TiDB cluster. The introduction of the resource control feature is a milestone for TiDB. You can divide a distributed database cluster into multiple logical units, map different database users to corresponding resource groups, and set quotas for each resource group as needed. When cluster resources are tight, all resources used by sessions from the same resource group will be limited within the quota, preventing one resource group from over-consuming and affecting the normal operation of sessions in other resource groups. This feature can also merge multiple small and medium-sized applications from different
| username: TiDBer_jYQINSnf | Original post link

:clap: :clap: :clap: :clap:

| username: Jellybean | Original post link

Many great features have appeared, so version 7.0 is recommended for experimental environments.

| username: tidb菜鸟一只 | Original post link

Looking forward to many features.

| username: 孤君888 | Original post link

It looks much more powerful than 6, thumbs up~

| username: dockerfile | Original post link

New feature: key partitioning, very much looking forward to it.

| username: langpingtan | Original post link

Enhanced Resource Control 2 (Experimental Feature)
Row-Level TTL 6 (GA)

| username: JonnieLee | Original post link

I’ll go download it now.

| username: 考试没答案 | Original post link

Going to upgrade and fight monsters now.

| username: Jaimyjie | Original post link

Looking forward to TiFlash supporting the separation of storage and computation architecture.

| username: 望海崖2084 | Original post link

Strange, last time I saw it, it was still version 5.

| username: BraveChen | Original post link

The iteration is really fast.

| username: Jiawei | Original post link

Resource control is the best.

| username: myzz | Original post link

:smiley: :smiley: :smiley:

| username: Running | Original post link

Looking forward to GA

| username: xingzhenxiang | Original post link

I have never understood why a distributed database needs to be partitioned again.

| username: magic | Original post link

It can be understood as Hive’s partitioning + bucketing.

| username: 考试没答案 | Original post link

Looking forward to the feature that supports REORGANIZE PARTITION syntax (List/Range partitioned tables).