Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: mysql改成tidb咨询
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
Changing MySQL to TiDB architecture:
Changing a stream table to a single table in TiDB,
- In the most extreme case, the single table might have a daily data volume of 100 million. Will using an auto-increment primary key for the ID cause hotspot issues? Since there is historical data migration, it is not possible to directly use random IDs. Is there a specific feasible optimization method? If switching to TiDB later, is it feasible to change to random IDs online?
- If this table is also a partitioned table in TiDB, and if data is retained for a maximum of 2 months with older data being exchanged out using the exchange partition method, will the number of regions in this table directly decrease? Additionally, if the ID is changed to random as mentioned in point 1, will there be duplicate IDs in the exchanged-out partitions?
- If you want to change the type of this field, the solution has already been provided above.
If you don’t want to change the type of this field, you can use a non-clustered table + SHARD_ROW_ID_BITS + PRE_SPLIT_REGIONS:
The number of split regions can be adjusted as needed. From my personal experience: if during heavy writes, not all machines have high load but only a few, it might be because the number of splits is too small.
The table creation statement is similar to the one below, with the commented part being mandatory:
CREATE TABLE `UserCardLog_date` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`Id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB /*T! SHARD_ROW_ID_BITS=4 PRE_SPLIT_REGIONS=4 */;
Documentation:
- The id will not be duplicated.
Hello, I would like to ask if you know if there is a specific formula for calculating SHARD_ROW_ID_BITS or what it is related to, such as the number of underlying TiKV nodes, the number of concurrent inserts, or something else. Currently, in our tests, we have about 4.8 billion data and around 23,000 regions at the underlying level.
Prerequisites:
PRE_SPLIT_REGIONS=4 means pre-splitting into 16 regions.
And PRE_SPLIT_REGIONS cannot be larger than SHARD_ROW_ID_BITS.
Relationship with the number of TiKV nodes:
In my actual usage, I have 4 TiKVs, but when I set PRE_SPLIT_REGIONS=2, which pre-splits into 4 regions, one TiKV’s write load never ramps up. When I set PRE_SPLIT_REGIONS=4, which pre-splits into 16 regions, under heavy write conditions, the CPU and IO of all 4 TiKVs reach a relatively high level. If you have more TiKV instances, I think you can pre-split more regions as much as possible. My experience is that when the number of pre-split regions matches the number of TiKV instances, it may not necessarily ensure balanced load. Slightly more regions tend to balance the load better.
Relationship with insertion concurrency:
In my case, the number of connections is slightly higher, but the TPS/QPS is very low. So overall concurrency on a single table will be even lower. I don’t have experience in this area and can’t provide any suggestions.
Sure, thank you very much for your help.
The write performance is related to the number of TiKV physical nodes and the performance of NVMe drives. Both factors need to be comprehensively considered and tested.
Okay, the TiKV nodes are going to be physical machines, while the others will be virtual machines.
TiKV can be used with a virtual machine, but it is recommended to use a separate data disk for direct passthrough to the virtual machine. This way, you won’t lose IO performance due to virtualized disks. For example, you can configure an NVMe disk passthrough in ESXi. Refer to the following:
VMware ESXi 6.7 Virtual Machine Using Host Passthrough Devices - 人走茶良 - 博客园 (cnblogs.com)
Try to use a physical machine as much as possible; virtual machines won’t be ideal…
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.