The true meaning of the parameter new_collations_enabled_on_first_bootstrap

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: new_collations_enabled_on_first_bootstrap 参数真正的含义

| username: Raymond

I encountered a very interesting issue when executing the following SQL statements:

create table a(id int, name char(4)) charset=utf8mb4 collate=utf8mb4_bin;
create table b(id int, name varchar(40)) charset=utf8mb4 collate=utf8mb4_bin;
insert into a values(1, 'test');
insert into b values(1, 'test   ');
select * from a, b where a.name = b.name;

In MySQL version 5.7.31, the join query returns data, but in TiDB version 5.3.1 (with new_collations_enabled_on_first_bootstrap set to false), it does not.


However, in version 6.1.0, the join query returns data (with new_collations_enabled_on_first_bootstrap set to true).

My personal thought is that this is related to how the varchar field type handles values with trailing spaces during queries. Based on the following tests:

create table t1 (id int, name varchar(10));
insert into t1 values(1, 'a'), (2, '  a'), (3, 'a  ');
select * from t1 where name = 'a';
select * from t1 where name = 'a  ';


In MySQL, if a varchar type stores values with trailing spaces, the query will also return the values with trailing spaces, for example, a and a are considered equal.

In TiDB, if a varchar type stores values with trailing spaces, the query will not return the values with trailing spaces, for example, a and a are considered unequal.

However, I am not sure why TiDB, when new_collations_enabled_on_first_bootstrap is set to true, handles varchar trailing spaces in the same way as MySQL.

| username: zzzzzz | Original post link

This configuration item only takes effect when the cluster is initially initialized. After initializing the cluster, you cannot enable or disable the new collation framework by changing this configuration item;