Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 缓存表很慢
There is currently a dimension table with less than 100,000 rows of data, which is now set as a cache table and is updated every hour. It has been observed that in slow query SQLs, the get_snapshot_time is very long when reading the cache table. Does this mean that it is caused by too much historical data? In other words, after the data is updated, when TiDB pulls the cache table from TiKV into memory, there are too many historical versions involved. Currently, only the changes from the last two hours are retained, so theoretically, the maximum would be the current total amount * 2, which is less than 200,000 rows of data. However, in some slow query SQLs, the get_snapshot_time sometimes takes several seconds. What could be causing this?
Is the data volume too large?
I mentioned that the data volume is not large, less than 100,000. Even with historical versions, it won’t exceed 200,000 in extreme cases.
Post the execution plan and let’s take a look.
Tables larger than 64M are not allowed to be used as cache tables. Although your 100,000 rows are not very large, the hourly updates might not be suitable. You can check the mysql.table_cache_meta table for any locks and information related to the lock lease.
Check whether the execution plan has hit the cache.
The use of cache tables includes one condition: the data does not change frequently, with a change frequency of one hour.
Additionally, the number of rows in the table doesn’t matter much; the main concern is the size.
Is it only this cache table that has this problem?
Cache tables are not suitable for updates, it’s best not to modify them.
Please provide the execution plan to check if there are many scan versions inside?
You can check if there is an IO bottleneck in the storage of TiKV.
Let’s take a look at the execution plan?
From the scenario where you need to update every hour, this is not very suitable for using a cache table. Updating a cache table is relatively slow and cumbersome. Cache tables are more suitable for scenarios with low update frequency and small update data volumes.
Check if the data volume is too large.
If the cache table involves frequent write operations, it may lead to write hotspots, affecting read performance.
Take a look at the execution plan.
Check the execution plan to analyze which step is slow, and then optimize it. Also, take a look at the system load, such as CPU, memory, and disk I/O.
Cached tables are a new feature starting from V6, with size limitations, and it is best to avoid frequent changes.
Check if it’s caused by the cache table being too large.