[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
A new table was created, and I wanted to add an index. I found that adding the index requires about 3M. I would like to ask what the principle of adding an index is? If adding an index requires traversing the entire table data, then an empty table with no data shouldn’t require 3M, right?
Creating an index on an empty table still requires scanning the region where the table is located.
An index is a DDL statement. If the TiDB server initiating the index creation is a worker, it will first find the owner TiDB server, parse the create index statement into a job, and place it in the add index queue to be executed serially with all other index creation statements.
Creating an index requires updating schema information and statistics. The schema information needs to be written to TiKV and also requires PD to notify TiDB to update the schema cache.
I reproduced it, and it indeed takes 3 seconds. Checking the logs, the main time consumption is on this log entry.
[2022/11/16 02:18:19.173 +00:00] [INFO] [ddl.go:1230] ["sleep before DDL finishes to make async commit and 1PC safe"] [duration=2.5s]
Due to the particularity and complexity of TiDB DDL, it is indeed much slower compared to other databases. However, DDL is generally a one-time task and can be mostly ignored. Additionally, you can refer to the reply in the following post to see if it helps: TIDB为什么 给一个空表创建索引会很慢呢 - #7,来自 centosredhat - TiDB 的问答社区
Also, there is a blog explaining the DDL principles: TiDB 源码阅读系列文章(十七)DDL 源码解析 | PingCAP