Index Occupies Too Much Disk Space and Severely Affects Performance

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 索引占用磁盘空间过大,并且严重影响性能

| username: breakyang

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

Adding a composite index (datetime + uuid string) and a regular index (uuid string) to a table with 5 billion rows occupies nearly 1TB of space. At the same time, the insertion time suddenly increased from 400ms to 4s at a certain point. I am using batch insert for bulk insertion.

| username: 我是咖啡哥 | Original post link

How large is the table with a 1TB index? Are there many other indexes as well?

| username: zhanggame1 | Original post link

It is best not to have large data volume writes when creating an index.

| username: tidb菜鸟一只 | Original post link

The advantage of indexes is fast queries, while the disadvantages are that they take up space and affect high-concurrency inserts and modifications.

| username: 像风一样的男子 | Original post link

Is the total size of this table with the index 1TB? It’s impossible for the index alone to be this large. Large-scale data insertion will definitely get slower over time.

| username: 有猫万事足 | Original post link

datetime=8 bytes
uuid=36 characters * 4 bytes (utf8mb4) = 144 bytes

5 billion indexes = 5,000,000,000 * 152 = 760,000,000,000 approximately 760GB, and you have 2 entries, so 1TB storage is not a problem.

The issue lies in the storage of the uuid string.

The best practice suggests using BINARY(16) instead of a string.

You can even search for uuid+mysql on Bilibili. There should be a lot of interview guides telling you not to use uuid as a primary key in MySQL.

The BIN_TO_UUID() and UUID_TO_BIN() functions added in MySQL 8 should address this issue to some extent.

| username: h5n1 | Original post link

Check the dashboard to see where the slow SQL is consuming time.

| username: Fly-bird | Original post link

This data volume is too large.

| username: 大飞哥online | Original post link

It should be 1T in total for data + index. With a large amount of concurrent data insertion, the index needs to be continuously maintained, so the larger the data volume, the slower the insertion. Alternatively, you could delete the index first, insert the data, and then create the index afterward.

| username: breakyang | Original post link

Before deleting the index: 3T, after deleting the index: 2T

| username: breakyang | Original post link

The commit log duration increased to 600ms.

| username: h5n1 | Original post link

It depends on the network and disk I/O performance of TiKV.