Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: sql语句查询时会访问PD server吗?
When executing a query like select * from t where id = 100;
, does the TiDB server convert the SQL statement into a KV query and then look up the region where the key is located in the PD server? If so, at what point in the SQL query execution is the LSM index used?
Thank you all, I couldn’t find documentation detailing the entire process of executing an SQL statement.
According to the key, query the TiKV information and region information from PD, and then query the data on the leader region. This process is cached in TiDB, so it is not queried every time. Once the TiKV is determined, it will perform a hierarchical search in RocksDB.
After identifying the region from PD, the query is handed over to the TiKV where the region is located. The corresponding region in TiKV then processes the request, ultimately reaching the RocksDB layer. However, all keys in the current TiKV are mixed together, essentially performing a KV query on RocksDB.
The question you asked should be about how indexes and LSM-trees work.
You are confusing indexes with LSM-trees. LSM-tree is the physical lookup principle, while an index is logical.
A region is a logical segment of RocksDB.
In the KV mapping, k equals table ID + primary key, or table ID + rowid, and v stores the data.
After creating an index, a new kv is added to k, where k equals table ID + primary key + index ID, and v stores the row number it points to.
During the query process, using an index results in scanning less data compared to a query without an index, making it faster.
When performing KV queries, LSM indexes can be used to accelerate query speed.
TiDB will cache the KV location information from PD, and will only query PD if the cache misses.
LSM tree is the data organization method at the bottom layer of TiKV.