Is it better to use auto-increment or random for TiDB data tables?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb数据表用你自增好,还是random好

| username: cy6301567

Is it better to use auto-increment or random for TiDB data tables? If using auto-increment, will there be hotspot issues?

| username: tidb菜鸟一只 | Original post link

Random is good, auto-increment will cause hotspot issues.

| username: cy6301567 | Original post link

However, in business scenarios, there might be cases where deep pagination queries are performed using IDs. It seems that not using auto-increment IDs might make such queries difficult.

| username: tidb菜鸟一只 | Original post link

If the business must rely on the continuous increment of the ID, then you can only use AUTO_INCREMENT.

| username: zhanggame1 | Original post link

Hotspots may not necessarily occur; it depends on your business volume. If you are using NVMe SSDs and have a requirement to write more than 100,000 records per second, you might want to consider this issue. If the I/O load is not high, auto-increment should suffice.

| username: caiyfc | Original post link

When creating a table, pre-split the regions to avoid hotspot issues. Just be aware that you need to start importing data immediately after creating the table, otherwise the regions will merge.

| username: cy6301567 | Original post link

Well, proceed according to the business situation.

| username: cy6301567 | Original post link

This might be useful for historical data migration.

| username: YuchongXU | Original post link

Design according to business requirements

| username: zhanggame1 | Original post link

TiDB’s use of auto-increment is different from MySQL, and different versions of TiDB also handle auto-increment differently.

AUTO_INCREMENT is a column attribute used to automatically fill in default column values. When the INSERT statement does not specify a specific value for the AUTO_INCREMENT column, the system will automatically assign a value to that column.

For performance reasons, auto-increment numbers are assigned in batches to each TiDB server (by default, 30,000 values), ensuring uniqueness, but the values assigned to the INSERT statement are only monotonic on a single TiDB server.

In various versions of TiDB, setting AUTO_ID_CACHE to 1 indicates that TiDB no longer caches IDs, but the implementation differs across versions:

  • For versions prior to TiDB v6.4.0, since each ID allocation requires a TiKV transaction to complete the persistent modification of the AUTO_INCREMENT value, setting AUTO_ID_CACHE to 1 results in a performance drop.
  • For versions v6.4.0 and above, due to the introduction of a centralized allocation service, modifying the AUTO_INCREMENT value is merely an in-memory operation within the TiDB service process, making it faster compared to previous versions.
  • Setting AUTO_ID_CACHE to 1 means TiDB uses the default cache size of 30000.
| username: redgame | Original post link

Sure, please provide the Chinese text you would like to translate.

| username: 哈喽沃德 | Original post link

For low concurrency, use auto-increment; for high concurrency, use random.

| username: zhanggame1 | Original post link

If there are no extreme requirements for write performance, auto-increment is sufficient.

| username: xiexin | Original post link

Primary keys are generally not used for business queries. Deep pagination is not recommended as it performs poorly. If the write volume is high, use random; otherwise, use auto-increment.

| username: TiDBer_vfJBUcxl | Original post link

Auto-increment is fine.

| username: xingzhenxiang | Original post link

Random is good, auto-increment will cause hotspot issues, even if auto-increment is not globally continuous.

| username: 这里介绍不了我 | Original post link

Auto-increment, unless there are specific business requirements.

| username: chris-zhang | Original post link

If there is deep pagination, it is recommended to use auto-increment.

| username: zhaokede | Original post link

It depends on whether it is used for the front end or the back end. For the front end, pagination is needed, so use AUTO_INCREMENT. If it is purely for back-end business and you are worried about hotspots, then use random.

| username: zhang_2023 | Original post link

Random is good.