TiDB Read Hotspot Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb读取热点问题

| username: 烂番薯0

Dear experts, what does this read hotspot mean? How can we eliminate read hotspots?

| username: caiyfc | Original post link

Look at the brightest spot, how much traffic per minute?

| username: 小龙虾爱大龙虾 | Original post link

Where do you see the hotspot? The brightness in this image is relative, you can’t just say it’s a hotspot because it’s bright.

| username: 烂番薯0 | Original post link

120 million

| username: 烂番薯0 | Original post link

So how is this hotspot determined, based on the read traffic per minute?

| username: 小龙虾爱大龙虾 | Original post link

Hotspots are relative; as long as your TiKV nodes can handle it, it’s fine. For normal operations, 120 M is not an issue. Typically, you check this heatmap when you notice that the load on each TiKV node is unbalanced and there is severe skew in the read thread pool.

| username: 随缘天空 | Original post link

Read hotspots may occur when the data being read is concentrated on a single machine, indicating that the stored data is unbalanced and regions have not been properly scattered, resulting in a high load on a particular machine.

| username: miya | Original post link

  1. You can change data reading to flower read, utilizing multiple replica shards to enhance read capability.
  2. Distribute the data and use multiple TiDB nodes to improve access performance.

However, for each of the above methods, you need to create indexes to enhance query performance. Additionally, you can temporarily disable the feature in the application for online environments until it is verified through testing, and then restore the feature.

| username: Hacker_QGgM2nks | Original post link

Common methods to eliminate read hotspots:

  1. Vertical Splitting: Split the hotspot table according to business logic, distributing the hotspot data across different tables. This can reduce the read pressure on a single table and improve overall read performance.
  2. Horizontal Splitting: Split the hotspot table based on a specific field, distributing the data across different partitions or shards. This can distribute read requests to different nodes, reducing the read pressure on a single node.
  3. Caching: Use caching technologies like Redis or Memcached to cache hotspot data. This can reduce read requests to the database and improve read performance.
  4. Load Balancing: Use a load balancer to evenly distribute read requests across different TiDB nodes. This can prevent overloading a single node and improve overall read performance.
  5. TiFlash: TiFlash is TiDB’s columnar storage engine, which can be used to store and query hotspot data. Storing hotspot data in TiFlash can reduce the read pressure on TiDB nodes and improve read performance.
  6. Pre-splitting: When creating a table, pre-distribute the data into different partitions based on the distribution of hotspot data. This can avoid data skew and improve read performance.
  7. Query Optimization: Optimize query statements by using appropriate indexes, avoiding full table scans, and other methods to reduce the read pressure on hotspot data.
| username: miya | Original post link

Additionally, I missed one point. Higher versions of TiDB can also use small table caching, but the cache is mainly for data that doesn’t change frequently. Once the data is updated, the cache becomes invalid, and it will query the data from TiDB again, causing a hotspot.

| username: 烂番薯0 | Original post link

Today, a table has reached 1.2G, but the TiKV node can handle it. This TiKV node has an unbalanced load. Where can I check the read and write thread pools?

| username: 烂番薯0 | Original post link

Can you see this?

| username: 小龙虾爱大龙虾 | Original post link

There is no skew, the CPU usage of TiKV is almost the same, optimize the SQL.

| username: 烂番薯0 | Original post link

Oh, I see. So, during that time period, the hotspot was significant, but each TiKV node was relatively balanced, and the CPUs were fully utilized. Therefore, the main reason is the SQL issue, not because a particular node was being accessed individually, right?

| username: 有猫万事足 | Original post link

Additionally, the height of the bright area (thickness in the Y-axis direction) is very critical. Since TiKV itself has a hotspot balancing mechanism based on Regions, the more Regions involved in the hotspot, the more beneficial it is to balance the traffic across all TiKV instances. The thicker and more numerous the bright stripes, the more dispersed the hotspots are, and the more TiKV instances can be utilized; the thinner and fewer the bright stripes, the more concentrated the hotspots are, the more significant the hotspot TiKV is, and the more it requires manual intervention and attention.

The read hotspots that need intervention should be the bright horizontal lines.

This graph actually indicates that the read times are relatively concentrated, but it is an ideal situation. At most, you might need to pay attention to those tables below. However, in my experience, those tables below are likely system tables related to statistical information, which cannot be modified. :sweat_smile:

| username: 健康的腰间盘 | Original post link

It should be an SQL issue.

| username: 烂番薯0 | Original post link

Understood, so the best solution is to add TiFlash nodes for OLAP statistics.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.