Is there a way in TiDB to specify that closely related tables are in the same or adjacent regions?

translator_bot · June 21, 2024, 10:55am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 有没有办法指定关联密切的表在一个或者临近的region中？

| username: 春风十里

Is there a way to specify closely related tables to be in the same or adjacent regions in TiDB?

For example, tables A and B are frequently queried together. If they are in the same region, it would definitely save IO and other resources. Is there any way to configure this?

translator_bot · June 21, 2024, 10:55am

| username: zhanggame1 | Original post link

This feature is not available. If the table data volume is small and not frequently modified, you can try small table caching.
If tables A and B are frequently queried together, TiKV will generally cache the data in memory, so you don’t need to worry about the speed.

translator_bot · June 21, 2024, 10:55am

| username: 小龙虾爱大龙虾 | Original post link

It doesn’t have the ability to perform joins locally; it has to be done on TiDB.

translator_bot · June 21, 2024, 10:55am

| username: dba远航 | Original post link

Using the label feature, you can locate the table to IDC, rack, and KV, but it is difficult to specify the exact REGION.

translator_bot · June 21, 2024, 10:55am

| username: 春风十里 | Original post link

The documentation says that multiple tables can be merged into the same region. I see that many of my small systems have tables with dozens or hundreds of records, and they are often accessed together. If they can be automatically merged into one region, it should be very good.
The official documentation says it can be merged but doesn’t explain how, so it should be automatic.

Before version 5.0, TiDB had the cross-table region merge feature disabled by default. Starting from version 5.0, TiDB enables the cross-table region merge feature by default to reduce the number of empty regions and lower the network, memory, and CPU overhead of the system. You can disable this feature by modifying the schedule.enable-cross-table-merge configuration item.

translator_bot · June 21, 2024, 10:55am

| username: 春风十里 | Original post link

This doesn’t seem to meet my expectations.

translator_bot · June 21, 2024, 10:55am

| username: 春风十里 | Original post link

How to understand local join?

translator_bot · June 21, 2024, 10:55am

| username: zhanggame1 | Original post link

Merging is automatically triggered only when there are a lot of empty regions.

translator_bot · June 21, 2024, 10:55am

| username: xfworld | Original post link

Would using global temporary tables meet your needs…

translator_bot · June 21, 2024, 10:55am

| username: Jellybean | Original post link

This feature does not exist.
If your table is not large, you can try the small table cache feature or temporary tables.

translator_bot · June 21, 2024, 10:55am

| username: 小龙虾爱大龙虾 | Original post link

Just like the logic of sharding, if two associated keys use the same sharding logic, then the join operation doesn’t need to be done at the computation layer; it can be done directly at the storage layer because the data that can be joined will definitely be on the same storage node.

translator_bot · June 21, 2024, 10:55am

| username: 春风十里 | Original post link

Okay, thank you for your reply.

translator_bot · June 21, 2024, 10:55am

| username: oceanzhang | Original post link

Labels can be marked, but it seems they are not that fine-grained.

translator_bot · June 21, 2024, 10:55am

| username: 有猫万事足 | Original post link

Actually, from TiDB’s perspective, reading from adjacent regions is not as efficient as reading from different machines. Although reading from a physically nearby location on a single machine might be faster locally, it has an upper limit on read capacity. However, the read capacity of multiple machines has no such limit.

Additionally, if it’s just a join operation, letting TiDB handle it is fine. But if it’s a join plus aggregation, it’s best to push it directly to TiFlash and let TiDB fetch the results. Doing this on a columnar storage can significantly reduce the amount of data scanned.

translator_bot · June 21, 2024, 10:55am

| username: zhanggame1 | Original post link

TiDB currently does not have this design; joins must be performed on the TiDB server.