Issues with the strcmp function

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: strcmp函数的问题

| username: daaxiang

[TiDB Usage Environment] Poc
[TiDB Version] 8.0.11-TiDB-v7.4.0
[Problem Encountered: strcmp function]
image
MySQL [test]> select strcmp(‘ABC’,‘ABc’);
±--------------------+
| strcmp(‘ABC’,‘ABc’) |
±--------------------+
| 0 |
±--------------------+
1 row in set (0.00 sec)

Question: In TiDB-v7.4.0, are the characters ‘ABC’ and ‘ABc’ considered equal?

| username: WalterWj | Original post link

I see TiDB and MySQL perform the same :thinking:

| username: daaxiang | Original post link

Then it’s different from the tutorial. Did I make a mistake somewhere?
The TiDB in the tutorial is version V6.1.

| username: ShawnYan | Original post link

Take a look at the character validation rules.

| username: 托马斯滑板鞋 | Original post link

:upside_down_face: The default configuration for 7.1.1 is -1.

| username: forever | Original post link

Check what the default collation of the database is.

| username: dba远航 | Original post link

This is related to character set collation. I have encountered a similar issue before. Using utf8mb4_bin does not have this problem.

| username: forever | Original post link

TiDB is case-sensitive by default. When you directly use CREATE DATABASE xxx, it uses the default settings. Using CREATE DATABASE xxx SET utf8mb4 COLLATE utf8mb4_general_ci; will result in differences.
Run SELECT @@character_set_database, @@collation_database; to check what is currently being used.

| username: 托马斯滑板鞋 | Original post link

I can’t view images directly. Please provide the text you need translated.

| username: 小龙虾爱大龙虾 | Original post link

This is related to character set collation, please see:

MySQL [test]> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select STRCMP('ABC','ABc');
+---------------------+
| STRCMP('ABC','ABc') |
+---------------------+
|                  -1 |
+---------------------+
1 row in set (0.00 sec)

MySQL [test]> SET NAMES utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select STRCMP('ABC','ABc');
+---------------------+
| STRCMP('ABC','ABc') |
+---------------------+
|                   0 |
+---------------------+
1 row in set (0.00 sec)

MySQL [test]> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v6.5.4 |
+--------------------+
1 row in set (0.00 sec)

Character set and collation reference: 字符集和排序规则 | PingCAP 文档中心
Collation precedence reference: https://dev.mysql.com/doc/refman/5.7/en/charset-collation-coercibility.html

According to the precedence rules, the collation used here is the session-level collation.

| username: daaxiang | Original post link

The results seem to be different.

| username: 小龙虾爱大龙虾 | Original post link

The variables you are looking at are the default character set and collation of the database. The direct comparison of two texts does not use this. You need to carefully read this article to clarify the character set and collation used in various situations. Refer to: 字符集和排序规则 | PingCAP 文档中心

| username: forever | Original post link

Check the session level.

| username: daaxiang | Original post link

This is the correct answer. My client’s sorting rule defaults to utf8_general_ci;
image
After setting it as the default, it meets expectations.
image
To be honest, TiDB’s documentation is quite confusing. I had to refer to the MySQL manual to get some clarity.

Thanks to everyone above for the help.

| username: zhanggame1 | Original post link

Whether to ignore case in collation will affect LIKE matching and string comparison.

| username: zxgaa | Original post link

Take a look at the character set.

| username: swino | Original post link

Check whether the character set and collation were specified when the database was created, as well as the character set configuration file.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.