The Issue of Secondary Index Write Hotspots

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 二级索引写入热点的问题

| username: Raymond

Hello everyone, I am wondering if there is a write hotspot issue with secondary indexes. From TiDB’s perspective, is the only way to scatter hotspots currently by using the split table method? For example:

Splitting the index into 10 regions:

split table t1 index index_date between 1991 and 2000 regions 10;

This would scatter the data from 1991 to 2000 in the index_date index of table t1 into 10 regions.

However, I have two thoughts:

  1. For write hotspots caused by inserts, if the secondary index is a time-type field and is inserted sequentially, I think the split table method to scatter index regions ultimately won’t be very effective. After scattering the index regions, the index will still be inserted incrementally, still causing hotspots.

  2. For update hotspots, I think it might be effective. If the updated data is always concentrated in some index regions, scattering these index regions into multiple regions can effectively solve the secondary index region issue.

So, currently, without adjusting the index, is there no good way to solve the secondary index hotspot caused by incremental inserts?

| username: xfworld | Original post link

In this case, you can additionally consider using a hash method to distribute the data, for example:
XXX user + createTime
XXX user + updateTime

By doing this, using the user’s information, you achieve secondary distribution, and the issue of index update hotspots can also be resolved. However, when querying, you need to consider the hit rate of the index conditions.

Besides this, there are other methods with similar principles.

| username: 我是咖啡哥 | Original post link

Haha, that’s how it is with incremental data. When inserting, you hope it separates, but when querying, you hope it’s together.

| username: Raymond | Original post link

Dates are basically repetitive, is this suitable for hashing?

| username: Raymond | Original post link

TiDB is said to be very difficult.

| username: BraveChen | Original post link

You can try to handle it the way I handle hotspots, by breaking it down by region. Column - What should be noted when migrating business from MySQL to TiDB? | TiDB Community

| username: Raymond | Original post link

Okay, thanks brother, but this is a write hotspot, and load base split cannot solve it.

| username: alfred | Original post link

There is currently no particularly good solution for the write hotspot issue caused by monotonically increasing indexes. Oracle’s proposed solutions include transforming the index into reverse key indexes or hash partitioning the indexes, but there are some usage limitations. At present, TiDB does not seem to support hash indexes.

| username: Raymond | Original post link

Teacher, hello, I would like to ask, for a single-machine database like Oracle, with incrementally increasing index inserts, the insertion performance and query performance should be relatively good, right?

| username: tidb菜鸟一只 | Original post link

Oracle’s time-based indexing certainly offers good query performance, but it can impact insertion performance and create hot blocks. Otherwise, Oracle wouldn’t have provided solutions like reverse key indexes. However, I think the biggest issue with using time fields as indexes is that if the table is large and statistics are not collected frequently, it can significantly affect the execution plans of related SQL queries. Often, due to the lack of the latest time information in the statistics, the system might automatically choose to use the time index, which can lead to considerable performance issues.

| username: Raymond | Original post link

Hello, teacher.
I don’t understand Oracle, but will this kind of actual insertion generate hot blocks?
When Oracle inserts, it should also use this kind of WAL method to insert. Initially, it doesn’t write directly to the disk data blocks. Will this also form hot blocks?

| username: BraveChen | Original post link

It’s the same, both involve splitting regions. Both reading and writing are done on regions, so it is useful. You just need to identify the regions with high write traffic.

| username: Raymond | Original post link

The official documentation mentions read traffic.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.