Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb 有没有办法指定关联密切的表在一个或者临近的region中?
Is there a way to specify closely related tables to be in the same or adjacent regions in TiDB?
For example, tables A and B are frequently queried together. If they are in the same region, it would definitely save IO and other resources. Is there any way to configure this?
This feature is not available. If the table data volume is small and not frequently modified, you can try small table caching.
If tables A and B are frequently queried together, TiKV will generally cache the data in memory, so you don’t need to worry about the speed.
It doesn’t have the ability to perform joins locally; it has to be done on TiDB.
Using the label feature, you can locate the table to IDC, rack, and KV, but it is difficult to specify the exact REGION.
The documentation says that multiple tables can be merged into the same region. I see that many of my small systems have tables with dozens or hundreds of records, and they are often accessed together. If they can be automatically merged into one region, it should be very good.
The official documentation says it can be merged but doesn’t explain how, so it should be automatic.
Before version 5.0, TiDB had the cross-table region merge feature disabled by default. Starting from version 5.0, TiDB enables the cross-table region merge feature by default to reduce the number of empty regions and lower the network, memory, and CPU overhead of the system. You can disable this feature by modifying the schedule.enable-cross-table-merge
configuration item.
This doesn’t seem to meet my expectations.
How to understand local join?
Merging is automatically triggered only when there are a lot of empty regions.
Would using global temporary tables meet your needs…
This feature does not exist.
If your table is not large, you can try the small table cache feature or temporary tables.
Just like the logic of sharding, if two associated keys use the same sharding logic, then the join operation doesn’t need to be done at the computation layer; it can be done directly at the storage layer because the data that can be joined will definitely be on the same storage node.
Okay, thank you for your reply.
Labels can be marked, but it seems they are not that fine-grained.
Actually, from TiDB’s perspective, reading from adjacent regions is not as efficient as reading from different machines. Although reading from a physically nearby location on a single machine might be faster locally, it has an upper limit on read capacity. However, the read capacity of multiple machines has no such limit.
Additionally, if it’s just a join operation, letting TiDB handle it is fine. But if it’s a join plus aggregation, it’s best to push it directly to TiFlash and let TiDB fetch the results. Doing this on a columnar storage can significantly reduce the amount of data scanned.
TiDB currently does not have this design; joins must be performed on the TiDB server.