Questions about Shard Index Feature

translator_bot · June 23, 2024, 9:53am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: shard index 特性疑问

| username: TiDBer_Wal2dh9o

TiDB v6.1 supports hotspot secondary index tidb_shard, and there are a few questions:

In the PR list (planner,expression,util: scatter hotspot index in certain scenes by expression index by TonsnakeLin · Pull Request #30659 · pingcap/tidb · GitHub), I only see how select is supported, but how is insert supported? That is, in the index (tidb_shard(a), a), does the prefix tidb_shard(a) need to be concatenated in front of a and stored in TiKV? If not, how is the write hotspot dispersed?
Currently, there are many limitations on shard index (tidb/docs/design/2022-01-04-integer-shard-index.md at master · pingcap/tidb · GitHub). CockroachDB has also implemented shard-index (Index Sequential Keys with Hash-sharded Indexes), but with almost no limitations. Has TiDB referred to CockroachDB? Are there any plans for further improvements?

translator_bot · June 23, 2024, 9:53am

| username: jansu-dev | Original post link

Yes, it acts as a scatter column in the following format:

Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue
Value: rowID

Currently, I haven’t found any follow-up optimization plans. May I ask what the specific case is for this purpose?

translator_bot · June 23, 2024, 9:53am

| username: h5n1 | Original post link

Support hash-sharded INDEX for sequential column like time

translator_bot · June 23, 2024, 9:53am

| username: TiDBer_Wal2dh9o | Original post link

Currently, shard index has too many limitations, only supporting equality queries and not range queries, etc. This feature is somewhat impractical in business applications.

translator_bot · June 23, 2024, 9:53am

| username: TiDBer_Wal2dh9o | Original post link

The limitations listed here:

Non-equality queries cannot use indexes.
When AND and OR are mixed in the query condition and the outermost operator is AND, SHARD INDEX cannot be used.
GROUP BY cannot use SHARD INDEX.
ORDER BY cannot use SHARD INDEX.
The ON clause cannot use SHARD INDEX.
Subqueries in WHERE cannot use SHARD INDEX.
SHARD INDEX can only scatter unique indexes of integer fields.
SHARD INDEX joint indexes may become invalid.
SHARD INDEX cannot follow the FastPlan process, affecting optimizer performance.
SHARD INDEX cannot use execution plan caching.

There are also some limitations: only supports unique indexes and supports int type input.

translator_bot · June 23, 2024, 9:53am

| username: TiDBer_Wal2dh9o | Original post link

I think using hash with bucket = N is pretty good. TiDB’s approach of using uk((tidb_shard(a)), a) is quite strange and doesn’t support specifying the number of buckets.

translator_bot · June 23, 2024, 9:53am

| username: Min_Chen | Original post link

There are currently no plans to remove these limitations. Please keep an eye on the roadmap and release notes.