Questions about Handling Spaces in Primary Key Values

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 主键值对空格的处理疑问

| username: hey-hoho

[TiDB Usage Environment] Test
[TiDB Version] v7.1.0
[Reproduction Path]

Found some strange behavior in version 7.1 when there are spaces in the primary key:

mysql> select @@sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@sql_mode                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY |
+-------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select @@global.tidb_enable_clustered_index ;
+--------------------------------------+
| @@global.tidb_enable_clustered_index |
+--------------------------------------+
| ON                                   |
+--------------------------------------+
1 row in set (0.00 sec)

Clustered table behavior:

mysql> create table tt (id varchar(10) not null primary key);
Query OK, 0 rows affected (0.12 sec)

mysql> show create table tt;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                         |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tt    | CREATE TABLE `tt` (
  `id` varchar(10) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into tt values('a ');
Query OK, 1 row affected (0.00 sec)

mysql> insert into tt values('a');
ERROR 1062 (23000): Duplicate entry 'a' for key 'tt.PRIMARY'

mysql> select *,length(id),char_length(id) from tt;
+----+------------+-----------------+
| id | length(id) | char_length(id) |
+----+------------+-----------------+
| a  |          2 |               2 |
+----+------------+-----------------+
1 row in set (0.00 sec)
mysql> select length('a'),char_length('a');
+-------------+------------------+
| length('a') | char_length('a') |
+-------------+------------------+
|           1 |                1 |
+-------------+------------------+
1 row in set (0.00 sec)

mysql> select length('a '),char_length('a ');
+--------------+-------------------+
| length('a ') | char_length('a ') |
+--------------+-------------------+
|            2 |                 2 |
+--------------+-------------------+
1 row in set (0.00 sec)

‘a’ and 'a ’ are not the same for the id field, causing a primary key conflict when writing.

The same applies to non-clustered tables:

mysql> create table tt2 (id varchar(10) not null primary key NONCLUSTERED);
Query OK, 0 rows affected (0.13 sec)

mysql> show create table tt2;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                             |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tt2   | CREATE TABLE `tt2` (
  `id` varchar(10) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into tt2 values('a ');
Query OK, 1 row affected (0.01 sec)

mysql> insert into tt2 values('a');
ERROR 1062 (23000): Duplicate entry 'a' for key 'tt2.PRIMARY'
mysql> select *,length(id),char_length(id) from tt2;
+----+------------+-----------------+
| id | length(id) | char_length(id) |
+----+------------+-----------------+
| a  |          2 |               2 |
+----+------------+-----------------+
1 row in set (0.01 sec)

[Encountered Problem: Phenomenon and Impact]

The same use case can be written in version 4.0.14:


my:root@127.0.0.1:4000=> select tidb_version();
                          tidb_version()
-------------------------------------------------------------------
 Release Version: v4.0.14                                         +
 Edition: Community                                               +
 Git Commit Hash: 4f919b07f8f013e48521894c70cb69004f43c622        +
 Git Branch: heads/refs/tags/v4.0.14                              +
 UTC Build Time: 2021-07-26 10:53:03                              +
 GoVersion: go1.13                                                +
 Race Enabled: false                                              +
 TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306+
 Check Table Before Drop: false
(1 row)
my:root@127.0.0.1:4000=>
my:root@127.0.0.1:4000=> use test;
USE
my:root@127.0.0.1:4000=> create table tt2 (id varchar(10) not null primary key);
CREATE TABLE
my:root@127.0.0.1:4000=> show create table tt2;
 Table |                        Create Table
-------+-------------------------------------------------------------
 tt2   | CREATE TABLE "tt2" (                                       +
       |   "id" varchar(10) NOT NULL,                               +
       |   PRIMARY KEY ("id")                                       +
       | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
(1 row)

my:root@127.0.0.1:4000=> insert into tt2 values('a ');
INSERT 1
my:root@127.0.0.1:4000=> insert into tt2 values('a');
INSERT 1
my:root@127.0.0.1:4000=>
my:root@127.0.0.1:4000=> select *,length(id),char_length(id) from tt;
error: mysql: 1146: Table 'test.tt' doesn't exist
my:root@127.0.0.1:4000=> select *,length(id),char_length(id) from tt2;
 id | length(id) | char_length(id)
----+------------+-----------------
 a  |          1 |               1
 a  |          2 |               2
(2 rows)

Not sure which version has the issue.

| username: yiduoyunQ | Original post link

| username: yiduoyunQ | Original post link

It is recommended that the application side handles deduplication for ‘a’ and 'a ’ as much as possible, rather than leaving it to the database side.

| username: forever | Original post link

MySQL handles it this way, ignoring the trailing spaces.

| username: hey-hoho | Original post link

It has been confirmed that it is a bug in the old collation of version 4.0. The normal behavior is to ignore trailing spaces. This bug has been fixed in version 6.0 and later with the new collation enabled by default, so the primary key conflict is expected.

| username: 哈喽沃德 | Original post link

It’s generally better for the frontend to handle this validation.

| username: vincentLi | Original post link

Regarding collation issues~ In addition to this, it is also important to note that the default collation has changed between the MySQL 5.7 driver and the 8.0 driver.