Questions about Shard Index Feature

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: shard index 特性疑问

| username: TiDBer_Wal2dh9o

TiDB v6.1 supports hotspot secondary index tidb_shard, and there are a few questions:

  1. In the PR list (planner,expression,util: scatter hotspot index in certain scenes by expression index by TonsnakeLin · Pull Request #30659 · pingcap/tidb · GitHub), I only see how select is supported, but how is insert supported? That is, in the index (tidb_shard(a), a), does the prefix tidb_shard(a) need to be concatenated in front of a and stored in TiKV? If not, how is the write hotspot dispersed?
  2. Currently, there are many limitations on shard index (tidb/docs/design/2022-01-04-integer-shard-index.md at master · pingcap/tidb · GitHub). CockroachDB has also implemented shard-index (Index Sequential Keys with Hash-sharded Indexes), but with almost no limitations. Has TiDB referred to CockroachDB? Are there any plans for further improvements?
| username: jansu-dev | Original post link

  1. Yes, it acts as a scatter column in the following format:
    Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue
    Value: rowID
    
  2. Currently, I haven’t found any follow-up optimization plans. May I ask what the specific case is for this purpose?
| username: h5n1 | Original post link

Support hash-sharded INDEX for sequential column like time

| username: TiDBer_Wal2dh9o | Original post link

Currently, shard index has too many limitations, only supporting equality queries and not range queries, etc. This feature is somewhat impractical in business applications.

| username: TiDBer_Wal2dh9o | Original post link

The limitations listed here:

  • Non-equality queries cannot use indexes.
  • When AND and OR are mixed in the query condition and the outermost operator is AND, SHARD INDEX cannot be used.
  • GROUP BY cannot use SHARD INDEX.
  • ORDER BY cannot use SHARD INDEX.
  • The ON clause cannot use SHARD INDEX.
  • Subqueries in WHERE cannot use SHARD INDEX.
  • SHARD INDEX can only scatter unique indexes of integer fields.
  • SHARD INDEX joint indexes may become invalid.
  • SHARD INDEX cannot follow the FastPlan process, affecting optimizer performance.
  • SHARD INDEX cannot use execution plan caching.

There are also some limitations: only supports unique indexes and supports int type input.

| username: TiDBer_Wal2dh9o | Original post link

I think using hash with bucket = N is pretty good. TiDB’s approach of using uk((tidb_shard(a)), a) is quite strange and doesn’t support specifying the number of buckets.

| username: Min_Chen | Original post link

There are currently no plans to remove these limitations. Please keep an eye on the roadmap and release notes.