TiUP reports "failed increasing schedule limit: no endpoint available" when upgrading to 7.5.1

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiUP在升级到7.5.1时,报"failed increasing schedule limit: no endpoint available"

| username: dba-kit

The error content is as follows. Although TiKV nodes will continue subsequently, I don’t know if there is a retry to modify the schedule limit here. In my test environment, I used --force directly, and it does not perform transfer-leader, so I can’t see the effect.
If not increased, it will lead to longer TiKV upgrade time, increasing the change time for the upgrade.

+ [ Serial ] - UpgradeCluster
Upgrading component pd
	Restarting instance 172.30.240.10:2379
	Restart instance 172.30.240.10:2379 success
	Restarting instance 172.30.240.4:2379
	Restart instance 172.30.240.4:2379 success
	Restarting instance 172.30.240.12:2379
	Restart instance 172.30.240.12:2379 success
Upgrading component tikv
failed increasing schedule limit: no endpoint available, the last err was: error requesting http://172.30.240.4:2379/pd/api/v1/config, response: redirect to not leader
, code 500, ignore
	Restarting instance 172.30.240.3:20160
	Restart instance 172.30.240.3:20160 success
| username: dba-kit | Original post link

It seems that the selected PD node subsequently triggered a re-election, and the Leader switched to another node.

| username: dba-kit | Original post link

I reviewed the code myself, and the error code is as follows. After the execution fails, it only logs the error and does not retry adjusting the PD parameters.


The increaseScheduleLimit function only modifies the following two PD parameters, and the logic is very simple. When the parameter is less than LimitThreshold, it will try to set it to +offset.

  1. leader-schedule-limit: no more than 64
  2. region-schedule-limit: no more than 1024
	leaderScheduleLimitOffset = 32
	regionScheduleLimitOffset = 512

	leaderScheduleLimitThreshold = 64
	regionScheduleLimitThreshold = 1024

If you encounter this error during a production upgrade and feel that transfer-leader is too slow and want to speed it up, you can manually adjust these two parameters using pd-ctl.

| username: Kongdom | Original post link

:+1: :+1: :+1: Self-resolved~
However, I am now increasingly inclined towards offline upgrades. Online upgrades always require waiting for migration.

| username: FutureDB | Original post link

We are the same. Although TiDB provides online upgrades, the entire upgrade process still takes quite a long time. We choose offline upgrades unless absolutely necessary, as they are simpler, quicker, and have a shorter impact time.

| username: Kongdom | Original post link

:handshake: :handshake: :handshake: Kindred spirits

| username: dba远航 | Original post link

This should be related to the number of CPUs.

| username: TIDB-Learner | Original post link

Create a downtime upgrade document. :100:

| username: Kongdom | Original post link

The official documentation is great :+1:

| username: 芝士改变命运 | Original post link

After the upgrade

| username: dba-kit | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.