Same SQL, Different TiDB Instances Yield Inconsistent Results (Execution Plans Also Differ)

translator_bot · June 22, 2024, 3:23pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 同一SQL，不同tidb实例结果不一致(执行计划也不一致)

| username: foxchan

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1.2
[Encountered Problem: Phenomenon and Impact]
The same SQL yields inconsistent results on different TiDB instances.
SQL Statement

SELECT
	a.*
FROM
	(
		SELECT
			a.*, ROUND(a.stat_cost * 100 / b.cost, 2) cost_rate,
			a.stat_cost cost,
			ROUND(a.ad_created * 100 / b.ad_created, 2) ad_created_rate,
			ROUND(a.valid_ads * 100 / b.valid_ads, 2) valid_ads_rate,
			ROUND(a.operates * 100 / b.operates, 2) operates_rate
		FROM
			(
				SELECT
					a.*,@rowNum :=@rowNum + 1 AS num
				FROM
					(
						SELECT
							operator,
							operator_name,
							ROUND(SUM(cost) / 1000000, 2) stat_cost,
							ROUND(SUM(goal_cost) / 1000000, 2) goal_cost,
							SUM(ad_created) ad_created,
							SUM(valid_ads) valid_ads,
							SUM(operates) operates,
							pdei.position,
							a.dept_id,
							a.dept_name,
							pdui.avatar,
							CASE
						WHEN pdui.position LIKE 'Manager' THEN
							operator_name
						ELSE
							''
						END manager
						FROM
							account_board_stat_daily_copy a
						LEFT JOIN pig_dd_employee_info pdei ON a.operator = pdei.user_id
						LEFT JOIN mbg_core.pig_dd_user_info pdui ON a.operator = pdui.userid
						WHERE
							stat_date BETWEEN '2023-02-27'
					AND '2023-03-04' 
						AND operator > 1000
						AND operate_type IN (0, 3)
						GROUP BY
							operator
						ORDER BY
							stat_cost ASC
					) a,
					(SELECT @rowNum := 0) b
			) a
		LEFT JOIN (
			SELECT
				a.*
			FROM
				(
					SELECT
						operator,
						operator_name,
						ROUND(SUM(cost) / 1000000, 2) cost,
						ROUND(SUM(goal_cost) / 1000000, 2) goal_cost,
						SUM(ad_created) ad_created,
						SUM(valid_ads) valid_ads,
						SUM(operates) operates,
						pdei.position,
						a.dept_id,
						a.dept_name,
						pdui.avatar
					FROM
						account_board_stat_daily_copy a
					LEFT JOIN pig_dd_employee_info pdei ON a.operator = pdei.user_id
					LEFT JOIN mbg_core.pig_dd_user_info pdui ON a.operator = pdui.userid
					WHERE
						stat_date BETWEEN '2023-02-20'
					AND '2023-02-26'
					AND operator > 0
					AND operate_type IN (0, 3)
					GROUP BY
						operator
					ORDER BY
						cost
				) a
		) b ON a.operator = b.operator
	) a
WHERE
	1 = 1
ORDER BY
	a.num ASC
LIMIT 0,
 100

Abnormal Result:
Health

Execution Plan

Normal Result TiDB Instance
Health

Execution Plan

translator_bot · June 22, 2024, 3:23pm

| username: h5n1 | Original post link

Was this version upgraded?

translator_bot · June 22, 2024, 3:23pm

| username: foxchan | Original post link

Upgraded from version 5. After the upgrade, all table execution plans were rerun. It’s been 5 months since the upgrade.

translator_bot · June 22, 2024, 3:23pm

| username: h5n1 | Original post link

Check the version using tiup cluster display, then check the MySQL connection prompt information and select version() on the two instances to see if the versions are consistent.

translator_bot · June 22, 2024, 3:23pm

| username: foxchan | Original post link

The image you provided is not visible. Please provide the text you need translated.

translator_bot · June 22, 2024, 3:23pm

| username: 爱白话的晓辉 | Original post link

Are all the component versions displayed on the dashboard consistent?

translator_bot · June 22, 2024, 3:23pm

| username: foxchan | Original post link

K8s installation, TiDB configuration versions are all consistent.

translator_bot · June 22, 2024, 3:23pm

| username: h5n1 | Original post link

Check the images of the two TiDB instances again.

translator_bot · June 22, 2024, 3:23pm

| username: 爱白话的晓辉 | Original post link

Are the inconsistent results random or specific to a particular TiDB instance?

translator_bot · June 22, 2024, 3:23pm

| username: foxchan | Original post link

Specified TiDB instance

translator_bot · June 22, 2024, 3:23pm

| username: 爱白话的晓辉 | Original post link

Try taking the faulty one offline and then bringing it back online.

translator_bot · June 22, 2024, 3:23pm

| username: 裤衩儿飞上天 | Original post link

Plan cache?

translator_bot · June 22, 2024, 3:23pm

| username: WalterWj | Original post link

The execution results are not fixed. Generally, it depends on whether the SQL results are affected by the order of the query results or whether the sorting fields of the entire pagination are complete and non-repetitive.

Different nodes have different execution plans. Try the following:

Manually collect the statistics of the relevant tables.
Restart the tidb-server to clear the table information cached in memory.

translator_bot · June 22, 2024, 3:23pm

| username: foxchan | Original post link

Rerunning the execution plan did not take effect, but restarting the abnormal tidb-server restored it.
How should this be avoided? Should we regularly restart the tidb server?

translator_bot · June 22, 2024, 3:23pm

| username: WalterWj | Original post link

Try upgrading, it feels like the table information in TiDB hasn’t been updated. This is quite rare.

translator_bot · June 22, 2024, 3:23pm

| username: buddyyuan | Original post link

Statistics are not in memory, or it might be an issue with asynchronous loading.

translator_bot · June 22, 2024, 3:23pm

| username: h5n1 | Original post link

I feel that the upgrade process for this problematic pod is still incomplete.

translator_bot · June 22, 2024, 3:23pm

| username: foxchan | Original post link

There are a total of 13 TiDB instances, and now multiple instances are found to be abnormal. It has nothing to do with incomplete upgrades.

translator_bot · June 22, 2024, 3:23pm

| username: foxchan | Original post link

The TiDB instances in the cluster were restarted one by one, but the results are still incorrect.
The results are incorrect when sorted by cost.

translator_bot · June 22, 2024, 3:23pm

| username: Running | Original post link

The statistics of tidbserver are incorrect. You can try executing ANALYZE TABLE on each tidbserver node to recalculate the statistics.