Solution to Hotspot Issues - SHARD_ROW_ID_BITS

translator_bot · June 23, 2024, 3:22am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于热点问题的解决-SHARD_ROW_ID_BITS

| username: TiDBer_QMmNXr1T

[TiDB Usage Environment] Test
[TiDB Version] 5.4
[Problem Encountered] How to disperse hotspots in tables with primary keys? Non-clustered index tables can directly set the SHARD_ROW_ID_BITS parameter, but if the table itself has a primary key, it cannot be set. How to solve this?
[Reproduction Path]

mysql> CREATE TABLE t (a BIGINT PRIMARY KEY AUTO_RANDOM, b varchar(255) ) SHARD_ROW_ID_BITS = 4;
ERROR 8200 (HY000): Unsupported shard_row_id_bits for table with primary key as row id

[Problem Phenomenon and Impact]
All tables in the business have primary keys. Does this mean that hotspots cannot be resolved?

[Attachment]

translator_bot · June 23, 2024, 3:22am

| username: 小龙虾爱大龙虾 | Original post link

Please refer to TiDB 热点问题处理 | PingCAP 文档中心 and TiDB 热点问题处理 | PingCAP 文档中心

translator_bot · June 23, 2024, 3:22am

| username: 啦啦啦啦啦 | Original post link

AUTO_RANDOM itself is already scattered.

translator_bot · June 23, 2024, 3:22am

| username: TiDBer_QMmNXr1T | Original post link

Thank you, everyone. If hotspots still occur after sharding, how can we resolve them?

translator_bot · June 23, 2024, 3:22am

| username: 啦啦啦啦啦 | Original post link

Did you still encounter hotspots when using AUTO_RANDOM?

translator_bot · June 23, 2024, 3:22am

| username: TiDBer_QMmNXr1T | Original post link

I mean, hypothetically, if it happens again, should we continue to break it up? If we continue to break it up, how should we do it? Because currently, there is a business considering using TiDB to replace MySQL, and I need to take some situations into account in advance.

translator_bot · June 23, 2024, 3:22am

| username: 啦啦啦啦啦 | Original post link

AUTO_RANDOM itself is random. If it can still generate hotspots, the system pressure must be quite high. Consider adding more nodes.

translator_bot · June 23, 2024, 3:22am

| username: TiDBer_QMmNXr1T | Original post link

Alright, it seems that’s the only way. At least I know what to do when a certain situation occurs. Thank you.

translator_bot · June 23, 2024, 3:22am

| username: 啦啦啦啦啦 | Original post link

Another possibility is that a large number of requests frequently query a small table, forming a hotspot in a single Region. You can manually split the region or consider using the caching table feature in version 6.0 to optimize.

translator_bot · June 23, 2024, 3:22am

| username: TiDBer_QMmNXr1T | Original post link

Thank you. Could you provide a document link for manually splitting regions so I can study it?

translator_bot · June 23, 2024, 3:22am

| username: TiDBer_QMmNXr1T | Original post link

By the way, if a heatmap shows that an index has become a hotspot, what should be done? The index is a non-clustered index, for example:

translator_bot · June 23, 2024, 3:22am

| username: 小龙虾爱大龙虾 | Original post link

The official TiDB documentation has a dedicated section on handling hotspot issues. If these methods cannot solve your problem, you might consider choosing other technologies, such as Redis.

translator_bot · June 23, 2024, 3:22am

| username: 啦啦啦啦啦 | Original post link

Check out the usage of pd-ctl

translator_bot · June 23, 2024, 3:22am

| username: h5n1 | Original post link

Hash partitioning

translator_bot · June 23, 2024, 3:22am

| username: OnTheRoad | Original post link

By using AUTO_RANDOM, the primary key read and write operations on the table have been dispersed, so hotspots will not occur. However, if a secondary index is created in the table, high concurrency might cause the secondary index to form hotspots. You can continue to disperse the index.

translator_bot · June 23, 2024, 3:22am

| username: OnTheRoad | Original post link

Scatter index

translator_bot · June 23, 2024, 3:22am

| username: HACK | Original post link

If an index is created on a date field (with date precision to the day or hour), will such a secondary index easily cause hotspots under high concurrency?

translator_bot · June 23, 2024, 3:22am

| username: OnTheRoad | Original post link

Index hotspots typically occur when data is inserted into monotonically increasing fields at the same time, or when a large number of duplicate values are inserted simultaneously. Here, index hotspots refer to indexes on monotonically increasing fields or duplicate value fields.

Based on the KV mapping principle of indexes, it can be understood that:

The index key structure in the primary key or unique index of a non-clustered table is: TableID_IndexID_IndexColumnValue. When data is batch inserted into a monotonically increasing field at the same time, the index key will inevitably be continuous and will be written into a single Region, forming a write hotspot. Therefore, if the date field in your batch insert data is an automatically obtained value like now(), it will create a hotspot.
The key structure of a regular secondary index is: TableID_IndexID_IndexColumnValue_TableRowID. If a large number of duplicate values are inserted at the same time, it will also create a write hotspot.

translator_bot · June 23, 2024, 3:22am

| username: 特雷西-迈克-格雷迪 | Original post link

auto_random cannot guarantee orderliness, please take note of this.

translator_bot · June 23, 2024, 3:22am

| username: HACK | Original post link

auto_random means random