Issue with Unique Key Not Matching Expectations in Version 6.1.7

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 6.1.7版本唯一键与预期不符问题

| username: xxxxxxxx

Bug Report
Clearly and accurately describe the issue you found. Providing any possible steps to reproduce the issue will help the development team address it promptly.
[TiDB Version] 6.1.7
[Impact of the Bug]
Unique key reports duplicate value issue, causing data insertion failure.

[Possible Steps to Reproduce the Issue]

  1. During the initialization of a new cluster with version 6.1.7, the parameter [new_collations_enabled_on_first_bootstrap: true] was specified.

  2. Create a test table with [COLLATE=utf8mb4_general_ci]

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `name` char(10) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  UNIQUE KEY `uk_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 sec)
  1. Inserting test data reports a unique key conflict, but in reality, the two values are non-conflicting.
mysql> insert into t1 values(1,'a'),(2,'a ');
ERROR 1062 (23000): Duplicate entry 'a' for key 'uk_name'
mysql>

[Observed Unexpected Behavior]
It appears that single or multiple trailing spaces in the string are removed, resulting in 'a ’ being stored as ‘a’.

[Expected Behavior]
For fields with unique index constraints, both ‘a’ and 'a ’ should be able to be inserted normally.

[Related Components and Specific Versions]

[Other Background Information or Screenshots]
Such as cluster topology, system and kernel version, application app information, etc.; if the issue is related to SQL, please provide SQL statements and related table schema information; if there are critical errors in node logs, please provide relevant node log content or files; if some business-sensitive information is inconvenient to provide, please leave contact information, and we will communicate with you privately.

| username: 小龙虾爱大龙虾 | Original post link

This is normal behavior, not a bug. Refer to:
https://dev.mysql.com/doc/refman/8.4/en/charset-binary-collations.html#charset-binary-collations-trailing-space-comparisons
https://dev.mysql.com/doc/refman/8.4/en/charset-unicode-sets.html#charset-unicode-sets-pad-attributes

| username: 鱼跃龙门 | Original post link

The official documentation shows

| username: Kongdom | Original post link

:thinking: That’s true, the official documentation does mention it. This handling seems quite user-friendly. Previously, when using SQL Server, trailing spaces were not handled, and I got tripped up by this situation~

| username: 濱崎悟空 | Original post link

Mark it~

| username: TiDBer_H5NdJb5Q | Original post link

Never noticed this before, learned something new. :ox:

| username: lemonade010 | Original post link

Mark this down, I really haven’t noticed this before.