What does the parameter "slow-store-evicting-affected-store-ratio-threshold" mean when executing "config show" in pd-ctl?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd-ctl执行config show多了slow-store-evicting-affected-store-ratio-threshold这个参数,含义是什么?

| username: dba-kit

Yes, I found that version 7.5 has added the slow-store-evicting-affected-store-ratio-threshold parameter. Is it used to adjust the threshold for evicting leaders from slow-store?

| username: TiDBer_jYQINSnf | Original post link

slow-store-evicting-affected-store-ratio-threshold

| username: dba-kit | Original post link

However, I still have some questions. How should this parameter be used? How can the slowness score of a node be related to the store-count of nodes?

| username: TiDBer_jYQINSnf | Original post link

I initially found this keyword from the code and posted it. Answering this question is indeed a bit tricky, I’ll take another look at the code. :smiley:

| username: TiDBer_jYQINSnf | Original post link

Check this out: https://github.com/tikv/pd/pull/5808. It seems to introduce a mechanism for evicting leaders to address occasional slow disk issues. It’s a bit complex, so take a look when you have time.

| username: dba-kit | Original post link

Strange, it should have been introduced by this PR, but it was merged into the main branch back in February 2023. How come this parameter is still not in version 6.5?

| username: dba-kit | Original post link

I get this kind of alert from time to time online, and I’ve become immune to it. :joy:

| username: yiduoyunQ | Original post link

Disk jitter is a challenging issue for any database. TiKV’s distributed architecture can proactively evict leaders to minimize the impact. This feature has been around for a long time, but the logic for determining when to evict has never satisfied all scenarios. Users generally hope for the fastest possible eviction of leaders, ideally without feeling the disk jitter at all. This is obviously impossible; only replacing the jittery disk can achieve that. Any intervention must evaluate the positive benefits and risks, considering long-term jitter lasting minutes to hours, daily second-level jitter, the degree of jitter, and the impact of leader eviction on cluster performance.

That said, adding a parameter to the slow store feature might be considered a compromise, allowing users to customize the configuration themselves.