Should Every Table Have a Bigint Meaningless Primary Key?

translator_bot · June 22, 2024, 4:12am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 是不是应该每个表都创建一个bigint 无意义的主键

| username: breakyang

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.0
Should every table have an AUTO RANDOM primary key?
In my business, the business primary key is a UUID formatted string. Should I use the business primary key or a bigint AUTO RANDOM primary key?

translator_bot · June 22, 2024, 4:12am

| username: zhanggame1 | Original post link

If this table does not have extreme write performance requirements, it is completely unnecessary.

translator_bot · June 22, 2024, 4:12am

| username: breakyang | Original post link

I saw in the official documentation that a hotspot issue can occur with a large number of writes, so I created an AUTO RANDOM ID for each table. What constitutes a large number of writes?

translator_bot · June 22, 2024, 4:12am

| username: zhanggame1 | Original post link

I tested writing 50,000 records per second using auto-increment on a regular consumer SSD without any issues. I guess you don’t have such a requirement either.

translator_bot · June 22, 2024, 4:12am

| username: tidb菜鸟一只 | Original post link

You can create the table as a clustered index table and use the UUID column as the primary key, which will not generate write hotspots.

translator_bot · June 22, 2024, 4:12am

| username: zhanggame1 | Original post link

In my tests with TiDB, the SSD performance is not too bad, but single table writes consume a lot of CPU and memory resources. To improve speed, it’s more effective to increase the performance of TiDB nodes, while adding TiKV nodes doesn’t have much impact.

translator_bot · June 22, 2024, 4:12am

| username: redgame | Original post link

AUTO RANDOM primary keys are not suitable for all situations.

translator_bot · June 22, 2024, 4:12am

| username: breakyang | Original post link

What if it’s a composite primary key?

translator_bot · June 22, 2024, 4:12am

| username: breakyang | Original post link

Can you please explain in what situations it is not suitable? I have many tables with composite primary keys, how should I handle this situation better?

translator_bot · June 22, 2024, 4:12am

| username: zhanggame1 | Original post link

The composite primary key uses a hidden auto-increment bigint column. Whether there is a bottleneck, I think it still needs to be stress-tested to see.

translator_bot · June 22, 2024, 4:12am

| username: Kongdom | Original post link

Personally, I feel it’s completely unnecessary. It’s better to set the primary key according to the business requirements.

translator_bot · June 22, 2024, 4:12am

| username: cassblanca | Original post link

It depends on what model you use to create the table and the business requirements.

translator_bot · June 22, 2024, 4:12am

| username: breakyang | Original post link

If you don’t understand, just ask. What does the model refer to here?
The reason I want to use meaningless IDs is due to two considerations:

Many of my tables have primary keys that are composite keys, and the scale is very large, such as tens of billions. I am worried that a large number of writes will cause hotspots.
Similarly, for a table with a data scale of tens of billions, I might create three to four indexes. If I use business primary keys like UUID, the index size will take up too much space.

I wonder if this consideration is reasonable.

translator_bot · June 22, 2024, 4:12am

| username: zhanggame1 | Original post link

There is no issue with writing to the composite primary key, and clustered tables write much faster than non-clustered tables.
You can test the write issue and you’ll see that it is greatly related to the number of indexes on the table. Adding an index significantly reduces the speed.
If you care about write speed, don’t add extra indexes.

translator_bot · June 22, 2024, 4:12am

| username: YuchongXU | Original post link

According to business needs.

translator_bot · June 22, 2024, 4:12am

| username: cy6301567 | Original post link

If there are business requirements and dependencies on auto-increment for deep pagination, you can use auto-increment. Otherwise, it is generally recommended to use random.

translator_bot · June 22, 2024, 4:12am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.