101 Common Questions for TiKV Beginners

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 看 101 TiKV 初学者疑问

| username: 洪七表哥

Generally speaking, does each table have its own separate region, or do all tables share regions?

  1. If they are separate, there could be a performance issue with heartbeats to the PD due to a large number of regions (the teacher gave an example of exceeding 50,000 regions).
  2. If they share regions, a full table scan on a single table might require scanning all regions (for example, if each region only has one piece of data).
| username: 洪七表哥 | Original post link

Is the above understanding correct? If so, how does TiDB solve this problem?

| username: zhaokede | Original post link

Occupying a region exclusively

| username: 小于同学 | Original post link

Occupying a region exclusively

| username: TiDBer_21wZg5fm | Original post link

Exclusive region

| username: tidb菜鸟一只 | Original post link

A region is generally 96MB. In most cases, a table corresponds to multiple regions. If cross-table merging is enabled, there may be a situation where several small tables exist on one region.

| username: tidb狂热爱好者 | Original post link

I didn’t understand this question either. Now I know that by default, one table has multiple regions, and one region does not correspond to multiple tables.

| username: 这里介绍不了我 | Original post link

Enabling this parameter: there will be a situation where one Region corresponds to multiple tables. Not enabling it: one Region corresponds to one table. One advantage of enabling it is that it can reduce the number of empty Regions.

| username: TiDBer_nhPDmAOw | Original post link

I don’t really understand either.

| username: TiDBer_小阿飞 | Original post link

    • Region: The smallest unit of load balancing. The full data in TiKV is stored in an ordered manner. PD divides the full data into a series of Regions based on the size of the data. Each Region carries a small range of the full data.

So it is an individual region.

| username: cassblanca | Original post link

By default, it is a separate region. If tables are merged, it may result in multiple tables sharing a region.

[quote=“Tidb789, post:1, topic:1024592”]
If regions are shared, a full table scan on a single table might require scanning all regions (for example, if each region only has one piece of data).
[/quote] This issue is not caused by shared regions. The read/write amplification problem is due to the architecture of TiKV itself. You can see the reply in this post What is read amplification and write amplification - TiDB Q&A Community (asktug.com).

| username: zhanggame1 | Original post link

Creating a new table by default results in a new region. If the table has very little data, can the database parameter enable-cross-table-merge be set to null to allow region merging across tables?

By querying, we can see that a region may include data from multiple tables.

If regions are shared, a full table scan for a single table might require scanning all regions (e.g., if each region only has one row of data). You don’t need to worry about this. A region is a logical concept; physically, all TiKV data is an ordered KV list, with the table ID included in the key. Data from the same table is continuous, so there is no issue of jumping scans.

The read/write amplification mentioned earlier is mainly due to TiKV’s underlying RocksDB converting all data insertions, deletions, and updates into new key insertions, leading to old data not being cleaned up.

| username: TiDBer_JUi6UvZm | Original post link

By default, it occupies a region separately. I’m also curious if so many regions sending messages to PD would cause PD to become a bottleneck. I’ll look into how PD handles this later.

| username: TiDBer_JUi6UvZm | Original post link

Why won’t PD become a bottleneck? Can someone who understands explain it?

| username: terry0219 | Original post link

The tikv-client component in tidb-server will cache the region information from PD, which can reduce the pressure on PD.

| username: 洪七表哥 | Original post link

Boss, how do you solve the performance issue caused by too many regions sending heartbeats to PD?

| username: TIDB-Learner | Original post link

Is it correct to understand that there are multiple scenarios for the relationship between table and region: 1:1, 1:N, N:1, N:N? By default, a region contains data from only one table.

| username: 洪七表哥 | Original post link

I thought region was also a physical concept.
So, can it be understood that large tables exclusively occupy multiple regions, while small tables exist in region fusion? Right?

| username: 洪七表哥 | Original post link

I have a question. In the video, it is mentioned that it is necessary to send heartbeats to PD regularly. TiDB is used for PB-level capacity. How do you solve the performance overhead caused by a large number of heartbeats in this situation?

| username: zhanggame1 | Original post link

The current solution is to adjust the default size of the region. By increasing the size, the number of regions will naturally decrease.

There will be an official solution in the future, wait for the implementation of version 8.0: