Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tikv支持手动释放block cache里的缓存么?
Currently, we are using Spark for big data algorithm testing, and the database is TiDB v6.1. We are using table A for algorithm testing. During the first execution of the algorithm test, there is no cached data in the cache, so the execution time is longer. We have added data to table A to expand the test data volume, but because there is cached data of table A in TiKV, the second test cannot accurately reflect the actual test time. Besides restarting the TiKV service, is there a way to manually release the TiKV cache?
It seems like it is not supported.
Would it be effective if I directly clear all the caches in the Linux system?
It seems that there is no relevant documentation, so it should not be supported.
You can try it in a test environment, but it is not recommended to forcibly release memory in a production environment.
SQL can use SQL_NO_CACHE to prevent it from being cached. I’m not very familiar with Spark, but if it also uses SQL, you can give it a try.
If it’s a testing environment, I think you can try this and see.
Is this setting written in the TiKV configuration file or configured as a parameter in the database?
This should not be supported.
I don’t know if Spark has such an operator.
Boss, after I set sql_no_cache, the execution plan still hits the cache, right?
If you are using TiSpark, here are a few small suggestions:
- TiSpark reads data directly from TiKV, and I believe the block cache is in the TiDB Server.
- Using the mysql client to run explain may not represent the actual execution process in Spark, so it is recommended to use Explain in Spark.
- Depending on your Spark application scenario, there might be a shuffler cache, which could affect your algorithm’s computation. It is advisable to consider this.
Would reducing the block cache also achieve the goal?
sql_no_cache
means not to use the cache, but it will not clear the cache. From the first execution with it, subsequent executions will not use the cache either.
Blockcache belongs to TiKV, or more precisely, to RocksDB.
Well, Spark SQL does not support this kind of syntax, but JDBC does.