Hello everyone, I am wondering if there is a write hotspot issue with secondary indexes. From TiDB’s perspective, is the only way to scatter hotspots currently by using the split table method? For example:
Splitting the index into 10 regions:
split table t1 index index_date between 1991 and 2000 regions 10;
This would scatter the data from 1991 to 2000 in the index_date index of table t1 into 10 regions.
However, I have two thoughts:
For write hotspots caused by inserts, if the secondary index is a time-type field and is inserted sequentially, I think the split table method to scatter index regions ultimately won’t be very effective. After scattering the index regions, the index will still be inserted incrementally, still causing hotspots.
For update hotspots, I think it might be effective. If the updated data is always concentrated in some index regions, scattering these index regions into multiple regions can effectively solve the secondary index region issue.
So, currently, without adjusting the index, is there no good way to solve the secondary index hotspot caused by incremental inserts?
In this case, you can additionally consider using a hash method to distribute the data, for example:
XXX user + createTime
XXX user + updateTime
By doing this, using the user’s information, you achieve secondary distribution, and the issue of index update hotspots can also be resolved. However, when querying, you need to consider the hit rate of the index conditions.
Besides this, there are other methods with similar principles.
There is currently no particularly good solution for the write hotspot issue caused by monotonically increasing indexes. Oracle’s proposed solutions include transforming the index into reverse key indexes or hash partitioning the indexes, but there are some usage limitations. At present, TiDB does not seem to support hash indexes.
Teacher, hello, I would like to ask, for a single-machine database like Oracle, with incrementally increasing index inserts, the insertion performance and query performance should be relatively good, right?
Oracle’s time-based indexing certainly offers good query performance, but it can impact insertion performance and create hot blocks. Otherwise, Oracle wouldn’t have provided solutions like reverse key indexes. However, I think the biggest issue with using time fields as indexes is that if the table is large and statistics are not collected frequently, it can significantly affect the execution plans of related SQL queries. Often, due to the lack of the latest time information in the statistics, the system might automatically choose to use the time index, which can lead to considerable performance issues.
Hello, teacher.
I don’t understand Oracle, but will this kind of actual insertion generate hot blocks?
When Oracle inserts, it should also use this kind of WAL method to insert. Initially, it doesn’t write directly to the disk data blocks. Will this also form hot blocks?
It’s the same, both involve splitting regions. Both reading and writing are done on regions, so it is useful. You just need to identify the regions with high write traffic.