Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 110亿表建索引,如何做到不影响业务
[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.2
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact] How to create an index on a table with 11 billion rows without affecting the business
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
In the production environment, the original index selectivity is poor, so we are preparing to adjust by creating a new index and deleting the original index. Is there any good way to avoid impacting the business? In the test environment, we used the fast index creation method and adjusted the parallelism, but the CPU spiked.
How is the parallelism adjusted? The tidb_ddl_reorg_worker_cnt
parameter cannot adjust the concurrency of fast reorg DDL before version 8. Refer to: TiDB 8.0.0 Release Notes | PingCAP 文档中心
I suggest you turn off fast reorg and create the index slowly.
It can be adjusted.
set global tidb_ddl_reorg_worker_cnt=4;
It won’t work even if you adjust it
Version 7.5 will have a good solution that does not affect business operations at all. You can temporarily scale out one or more TiDB instances.
Set the tidb_service_scope
of these newly scaled-out machines to background.
When adding an index, only the scaled-out machines will execute the add index operation. After completion, you can scale them back down. This minimizes the impact on the original business operations. Additionally, when multiple TiDB instances build indexes in parallel, the overall time to build the index will be significantly reduced. In my own tests, the time taken by 2 instances to add an index was 66% of the time taken by 1 TiDB instance. If you use 3 instances, the time will be even lower.
If you want stability, you can’t rush. Being too slow will affect the business, which is indeed a headache. A table with 10 billion rows, how many GB?
The specific storage size will be affected by various factors, including the type of data, compression rate, number and size of indexes, etc.
Assuming each row of data is 1KB in size, 10 billion rows would be approximately 9,536GB.
1.5T, the table is a short table.