TiFlash Calculation Error, TiKV Calculation Correct, Turns Out to Be Related to Automatic Cast

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash计算结果错误,tikv计算正确,原来是自动cast有关

| username: anafin

SQL: The logic is as follows, and it has been desensitized, so it looks a bit odd.
Explanation:

  1. The main table is t_order, and the id of t_order is order_id (not exceeding the int range), similar to int unsigned. The other three tables are associated with this table through order_id, type int. All four tables are in TiFlash.
  2. Tried rebuilding all tables in TiFlash, but they were still converted to decimal, causing incorrect results.
  3. Tried changing t_order_exxt_old.order_id from int to bigint, but it was still converted to decimal, causing incorrect results.

select
COUNT(DISTINCT a.order_id)
from
t_ocwprice a # order_id int
inner join t_log b on a.order_id=b.order_id # order_id type int
inner join t_order c on a.order_id=c.id # c.id type int unsigned
inner join t_order_exxt_old d on c.id=d.order_id # Execution plan found: cast(t_order_exxt_old.order_id, decimal(20,0))->Column#224, incorrect result, much less than actual.
#inner join t_order_exxt_old d on c.id=cast( d.order_id as UNSIGNED ) # d.order_id type int, manually cast to unsigned, correct result. So I infer that the incorrect result is due to TiFlash type conversion error. Why does it need to be converted to decimal type here??
where b.ctime>UNIX_TIMESTAMP(20240301)
and (b.bmonth=0 or b.bmonth=202403)
and c.btime>=UNIX_TIMESTAMP(20240301);

—This bug is related to the table design not being rigorous, but I couldn’t find relevant information. So has this bug been discovered by the official team, and is there hope for a fix?

| username: anafin | Original post link

Testing found that this bug is triggered when a, b, then c, and finally d table is processed last, causing the decimal type conversion (see the screenshot below, for de-sensitization purposes, the info content is not included, the conversion seen in info: cast(t_order_exxt_old.order_id, decimal (20,0))->Column#224). If the d table data is relatively small, the execution plan processes a and b first, then d, and finally c, no type conversion occurs, and the result is correct.

| username: zhaokede | Original post link

Type int unsigned

Can all tables be changed to the same type?

| username: anafin | Original post link

Logically, it is possible. However, the table is large, and the primary key of t_order_exxt_old is order_id. TiDB does not support changing the primary key from int to unsigned, and making this change requires stopping the service.

| username: Billmay表妹 | Original post link

I’ll ask around on the intranet for you.

| username: Billmay表妹 | Original post link

Ask him to simplify it and provide one or two lines of the case that went wrong.