Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb数据表用你自增好,还是random好
Is it better to use auto-increment or random for TiDB data tables? If using auto-increment, will there be hotspot issues?
Random is good, auto-increment will cause hotspot issues.
However, in business scenarios, there might be cases where deep pagination queries are performed using IDs. It seems that not using auto-increment IDs might make such queries difficult.
If the business must rely on the continuous increment of the ID, then you can only use AUTO_INCREMENT.
Hotspots may not necessarily occur; it depends on your business volume. If you are using NVMe SSDs and have a requirement to write more than 100,000 records per second, you might want to consider this issue. If the I/O load is not high, auto-increment should suffice.
When creating a table, pre-split the regions to avoid hotspot issues. Just be aware that you need to start importing data immediately after creating the table, otherwise the regions will merge.
Well, proceed according to the business situation.
This might be useful for historical data migration.
Design according to business requirements
TiDB’s use of auto-increment is different from MySQL, and different versions of TiDB also handle auto-increment differently.
AUTO_INCREMENT
is a column attribute used to automatically fill in default column values. When the INSERT
statement does not specify a specific value for the AUTO_INCREMENT
column, the system will automatically assign a value to that column.
For performance reasons, auto-increment numbers are assigned in batches to each TiDB server (by default, 30,000 values), ensuring uniqueness, but the values assigned to the INSERT
statement are only monotonic on a single TiDB server.
In various versions of TiDB, setting AUTO_ID_CACHE
to 1
indicates that TiDB no longer caches IDs, but the implementation differs across versions:
- For versions prior to TiDB v6.4.0, since each ID allocation requires a TiKV transaction to complete the persistent modification of the
AUTO_INCREMENT
value, setting AUTO_ID_CACHE
to 1
results in a performance drop.
- For versions v6.4.0 and above, due to the introduction of a centralized allocation service, modifying the
AUTO_INCREMENT
value is merely an in-memory operation within the TiDB service process, making it faster compared to previous versions.
- Setting
AUTO_ID_CACHE
to 1
means TiDB uses the default cache size of 30000
.
Sure, please provide the Chinese text you would like to translate.
For low concurrency, use auto-increment; for high concurrency, use random.
If there are no extreme requirements for write performance, auto-increment is sufficient.
Primary keys are generally not used for business queries. Deep pagination is not recommended as it performs poorly. If the write volume is high, use random; otherwise, use auto-increment.
Random is good, auto-increment will cause hotspot issues, even if auto-increment is not globally continuous.
Auto-increment, unless there are specific business requirements.
If there is deep pagination, it is recommended to use auto-increment.
It depends on whether it is used for the front end or the back end. For the front end, pagination is needed, so use AUTO_INCREMENT. If it is purely for back-end business and you are worried about hotspots, then use random.