Is it necessary to pre-split indexes when performing bulk loading?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 在进行大批量加载的时候是否需要预先split索引

| username: dxss-lee

When creating a table in TiDB, you can pre-shard the table using shard_bit. However, the index on the table only has one region. Will this region cause a hotspot when data is being inserted? If we split this region for the index, will it improve the efficiency of data loading?

| username: h5n1 | Original post link

If the index is increasing or decreasing, it will cause hotspots, and even pre-splitting won’t solve this problem. Hash partitioning might help to some extent. Starting from version 6.0, there is a shard index feature that can shard incrementing indexes of the Int type.

| username: dxss-lee | Original post link

Can the index of GUID type and non-auto-increment be used to reduce hotspots through splitting?

| username: h5n1 | Original post link

As long as it is not incremental, it should have some effect. Also, after pre-splitting, the new regions are still on the original TiKV. You might need to consider setting the scatter variable to balance them to other TiKVs.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.