How to Improve Performance for a Single Table with 20 Billion Rows in TiDB?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 单表200亿如何提高性能?

| username: TiDBer_jYQINSnf

For a table like this, with some additional columns, approximately 20 billion rows per table, and daily data volume fluctuating between 100 million to 400 million, similar SQL queries:

SELECT count(1) FROM test WHERE createTime BETWEEN '2024-3-24 00:00:00' AND '2024-3-24 23:59:59' and no= '132943828293'

The query speed is at the minute level. Previously, using sharding, the query speed could be within 5 seconds. Planning to switch to TiDB, in this case, without TiFlash, the query speed might not be fast. How can we optimize to make TiDB’s speed within 10 seconds?

Sharding can evenly distribute requests to all machines. In TiDB, because of regions and free scheduling, a day’s data might be concentrated on one or two machines. If the query QPS is relatively low, it’s unlikely to pressure all TiKV nodes, resulting in only two or three TiKV nodes participating in the query, making the speed relatively slow. What are the optimization methods for this?

| username: 托马斯滑板鞋 | Original post link

Partition by createTime by day range

| username: TiDBer_jYQINSnf | Original post link

Is the performance of a partitioned table faster than a non-partitioned table? A partitioned table is equivalent to splitting a large table into many smaller tables. With a time index, will the data scanned in a large table be a lot?

| username: 托马斯滑板鞋 | Original post link

Partition pruning significantly reduces the amount of data scanned (faster than indexing)

| username: Soysauce520 | Original post link

After v6.5, there is dynamic partition pruning. If the TiFlash machine performs well, it can handle the scans.

| username: 托马斯滑板鞋 | Original post link

If TiDB supports local partitioned indexes like Oracle in the future, it will be even faster :upside_down_face:

| username: TiDBer_jYQINSnf | Original post link

Assuming a total of 20 billion, with 100 million data per day, if not partitioned, querying with an index would scan 100 million indexes and then 100 million table rows, right?
Wouldn’t a partitioned table also scan 100 million indexes and 100 million table rows?
Or does the scan range exceed 100 million when not partitioned?

| username: mono | Original post link

To be honest, this table might affect the overall system stability. You should check with the data warehouse for the table requirements!

| username: 托马斯滑板鞋 | Original post link

If it is a partitioned table, there should not be 100 million table lookups (because it directly selects the partition to read data, without using an index).

| username: 有猫万事足 | Original post link

It is a typical aggregation calculation, and tiflash + mpp is indeed the best solution.

I’ll give you a reference. This is an execution plan for a similar count(1) on a partitioned table with 290 million rows.

1711521888264

The hardware configuration is 2 machines with 4 cores and 8GB of RAM, 2 tiflash instances + mpp.

The IO is relatively poor, it’s Tencent Cloud, the cheapest storage. You can see the CPU fluctuations are not even as frequent as the GC every 10 minutes.

Building targeted indexes on row storage is also a solution. The question is, are the conditions in the WHERE clause really that certain? If not, the efficiency will be very poor once the index misses.

| username: TiDBer_jYQINSnf | Original post link

Looking at this SQL, it needs to check all the data for that day, and there is no index on no. In this case, the table scan would be 100 million. The difference between partitioned and non-partitioned is in the number of index entries scanned. Does the number of index entries scanned in a non-partitioned table exceed that of a partitioned table? Would it exceed 100 million?

| username: 托马斯滑板鞋 | Original post link

In theory, it works like this:
First, scan 100 million through the createtime index, then back to the table for 100 million.
For partitioned tables, it directly falls on the partition to scan 100 million.

| username: TiDBer_jYQINSnf | Original post link

It should be faster with TiFlash, but I want to know if there are other ways without using TiFlash.

| username: 有猫万事足 | Original post link

If the WHERE condition is certain, then it is an index. However, in my experience, most of the time it is not certain. Even if the developers are very confident now, in the end, the index is not hit, the TiKV CPU spikes, and a full table scan occurs, you still need to clean up the mess.

| username: TiDBer_jYQINSnf | Original post link

Using partitioned tables, there is also an index scan. Will this IndexRangeScan read more rows than when not partitioned?

mysql> explain SELECT count(1) FROM test WHERE createTime BETWEEN '2024-3-27 00:00:00' AND '2024-3-27 23:59:59' and no = '12314321231';
+----------------------------------+---------+-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------+
| id                               | estRows | task      | access object                                                 | operator info                                                                   |
+----------------------------------+---------+-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------+
| StreamAgg_12                     | 1.00    | root      |                                                               | funcs:count(1)->Column#31                                                       |
| └─IndexLookUp_28                 | 0.25    | root      |                                                               |                                                                                 |
|   ├─IndexRangeScan_25(Build)     | 250.00  | cop[tikv] | table:test, partition:p202403, index:idx_createTime(createTime) | range:[2024-03-27 00:00:00,2023-03-27 23:59:59], keep order:false, stats:pseudo |
|   └─Selection_27(Probe)          | 0.25    | cop[tikv] |                                                               | eq(test.test.no, "12314321231")                                                 |
|     └─TableRowIDScan_26          | 250.00  | cop[tikv] | table:test, partition:p202403                                  | keep order:false, stats:pseudo                                                  |
+----------------------------------+---------+-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
| username: TiDBer_jYQINSnf | Original post link

I feel that this table can’t be handled without TiFlash, but adding TiFlash requires more than one replica, which makes it five replicas in total. The cost has no advantage :man_shrugging:

| username: 有猫万事足 | Original post link

You bring up a very practical issue. For example, my two low-spec TiFlash nodes are sharing a machine with other systems, mainly due to cost considerations.

I think you can try using a machine with poor configuration first. Maybe once it’s in use, someone will be willing to pay for its stability. If no one cares, the cost won’t be too high either. I believe deploying one with a 4-core, 8GB configuration should still be acceptable. Just give it a try; if you can produce some reports showing improvements, the cost issue might be resolved.

| username: 托马斯滑板鞋 | Original post link

:joy: That means TiDB will automatically add an index on the partition key, which is no different from directly creating an index.

| username: tidb菜鸟一只 | Original post link

Isn’t it enough to create a composite index on no and createTime for this SQL?

| username: TiDBer_jYQINSnf | Original post link

No, this column doesn’t have much distinction, just a few values. This SQL is just an example; in reality, a form is created for this table, with each column being a cell. The only thing that can be controlled is the mandatory input of the date, while other columns can be combined arbitrarily, with a timeout of 30 seconds.
Using sharding for queries is still somewhat usable, but replacing it with TiDB seems to be a bit challenging.