Same SQL, Different TiDB Instances Yield Inconsistent Results (Execution Plans Also Differ)

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 同一SQL,不同tidb实例 结果不一致(执行计划也不一致)

| username: foxchan

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1.2
[Encountered Problem: Phenomenon and Impact]
The same SQL yields inconsistent results on different TiDB instances.
SQL Statement

SELECT
	a.*
FROM
	(
		SELECT
			a.*, ROUND(a.stat_cost * 100 / b.cost, 2) cost_rate,
			a.stat_cost cost,
			ROUND(a.ad_created * 100 / b.ad_created, 2) ad_created_rate,
			ROUND(a.valid_ads * 100 / b.valid_ads, 2) valid_ads_rate,
			ROUND(a.operates * 100 / b.operates, 2) operates_rate
		FROM
			(
				SELECT
					a.*,@rowNum :=@rowNum + 1 AS num
				FROM
					(
						SELECT
							operator,
							operator_name,
							ROUND(SUM(cost) / 1000000, 2) stat_cost,
							ROUND(SUM(goal_cost) / 1000000, 2) goal_cost,
							SUM(ad_created) ad_created,
							SUM(valid_ads) valid_ads,
							SUM(operates) operates,
							pdei.position,
							a.dept_id,
							a.dept_name,
							pdui.avatar,
							CASE
						WHEN pdui.position LIKE 'Manager' THEN
							operator_name
						ELSE
							''
						END manager
						FROM
							account_board_stat_daily_copy a
						LEFT JOIN pig_dd_employee_info pdei ON a.operator = pdei.user_id
						LEFT JOIN mbg_core.pig_dd_user_info pdui ON a.operator = pdui.userid
						WHERE
							stat_date BETWEEN '2023-02-27'
					AND '2023-03-04' 
						AND operator > 1000
						AND operate_type IN (0, 3)
						GROUP BY
							operator
						ORDER BY
							stat_cost ASC
					) a,
					(SELECT @rowNum := 0) b
			) a
		LEFT JOIN (
			SELECT
				a.*
			FROM
				(
					SELECT
						operator,
						operator_name,
						ROUND(SUM(cost) / 1000000, 2) cost,
						ROUND(SUM(goal_cost) / 1000000, 2) goal_cost,
						SUM(ad_created) ad_created,
						SUM(valid_ads) valid_ads,
						SUM(operates) operates,
						pdei.position,
						a.dept_id,
						a.dept_name,
						pdui.avatar
					FROM
						account_board_stat_daily_copy a
					LEFT JOIN pig_dd_employee_info pdei ON a.operator = pdei.user_id
					LEFT JOIN mbg_core.pig_dd_user_info pdui ON a.operator = pdui.userid
					WHERE
						stat_date BETWEEN '2023-02-20'
					AND '2023-02-26'
					AND operator > 0
					AND operate_type IN (0, 3)
					GROUP BY
						operator
					ORDER BY
						cost
				) a
		) b ON a.operator = b.operator
	) a
WHERE
	1 = 1
ORDER BY
	a.num ASC
LIMIT 0,
 100

Abnormal Result:
Health
image
Execution Plan

Normal Result TiDB Instance
Health


Execution Plan

| username: h5n1 | Original post link

Was this version upgraded?

| username: foxchan | Original post link

Upgraded from version 5. After the upgrade, all table execution plans were rerun. It’s been 5 months since the upgrade.

| username: h5n1 | Original post link

Check the version using tiup cluster display, then check the MySQL connection prompt information and select version() on the two instances to see if the versions are consistent.

| username: foxchan | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: 爱白话的晓辉 | Original post link

Are all the component versions displayed on the dashboard consistent?

| username: foxchan | Original post link

K8s installation, TiDB configuration versions are all consistent.

| username: h5n1 | Original post link

Check the images of the two TiDB instances again.

| username: 爱白话的晓辉 | Original post link

Are the inconsistent results random or specific to a particular TiDB instance?

| username: foxchan | Original post link

Specified TiDB instance

| username: 爱白话的晓辉 | Original post link

Try taking the faulty one offline and then bringing it back online.

| username: 裤衩儿飞上天 | Original post link

Plan cache?

| username: WalterWj | Original post link

The execution results are not fixed. Generally, it depends on whether the SQL results are affected by the order of the query results or whether the sorting fields of the entire pagination are complete and non-repetitive.

Different nodes have different execution plans. Try the following:

  1. Manually collect the statistics of the relevant tables.
  2. Restart the tidb-server to clear the table information cached in memory.
| username: foxchan | Original post link

Rerunning the execution plan did not take effect, but restarting the abnormal tidb-server restored it.
How should this be avoided? Should we regularly restart the tidb server?

| username: WalterWj | Original post link

Try upgrading, it feels like the table information in TiDB hasn’t been updated. This is quite rare. :thinking:

| username: buddyyuan | Original post link

Statistics are not in memory, or it might be an issue with asynchronous loading.

| username: h5n1 | Original post link

I feel that the upgrade process for this problematic pod is still incomplete.

| username: foxchan | Original post link

There are a total of 13 TiDB instances, and now multiple instances are found to be abnormal. It has nothing to do with incomplete upgrades.

| username: foxchan | Original post link

The TiDB instances in the cluster were restarted one by one, but the results are still incorrect.
The results are incorrect when sorted by cost.
image

| username: Running | Original post link

The statistics of tidbserver are incorrect. You can try executing ANALYZE TABLE on each tidbserver node to recalculate the statistics.