Unable to get collation ID by name, using ID of the default collation instead

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Unable to get collation ID by name, use ID of the default collation instead

| username: TiDBer_hxqv9az1

[TiDB Usage Environment] Production Environment / Testing / Poc
Production Environment
[TiDB Version]
5.7.25-TiDB-v7.5.1
[Reproduction Path] What operations were performed to cause the issue

  1. Checked all component logs, only tidbserver had collation-related warnings.
  2. Opened the TiDB general log, executed the SQL before and after the context of the warning separately, but could not reproduce it.
  3. Always generates 9-25 warnings in batches instantly.
  4. show variables like ‘%collation%’;

±--------------------------------------±-------------------+
| Variable_name | Value |
±--------------------------------------±-------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database | utf8mb4_bin |
| collation_server | utf8mb4_bin |
| default_collation_for_utf8mb4 | utf8mb4_bin |
| tidb_hash_exchange_with_new_collation | ON |
±--------------------------------------±-------------------+

  1. select * from mysql.tidb where variable_name = ‘new_collation_enabled’;

±----------------------±---------------±---------------------------------------------------+
| VARIABLE_NAME | VARIABLE_VALUE | COMMENT |
±----------------------±---------------±---------------------------------------------------+
| new_collation_enabled | True | If the new collations are enabled. Do not edit it. |
±----------------------±---------------±---------------------------------------------------+

  1. select COLLATION_NAME, count(1) from COLUMNS group by COLLATION_NAME;

±-------------------±---------+
| COLLATION_NAME | count(1) |
±-------------------±---------+
| utf8_bin | 12 |
| utf8_general_ci | 1 |
| utf8mb4_unicode_ci | 2438 |
| utf8mb4_bin | 2185 |
| NULL | 7806 |
| utf8mb4_general_ci | 6 |
±-------------------±---------+

  1. select TABLE_COLLATION, count(1) ac from TABLES group by TABLE_COLLATION;

±-------------------±----+
| TABLE_COLLATION | ac |
±-------------------±----+
| NULL | 1 |
| utf8mb4_bin | 811 |
| utf8mb4_unicode_ci | 455 |
| utf8_bin | 1 |
±-------------------±----+

[Encountered Issue: Phenomenon and Impact]
The TiDB log keeps refreshing with:
[WARN] [collate.go:373] [“Unable to get collation ID by name, use ID of the default collation instead”] [name=] [“default collation ID”=-46] [“default collation”=utf8mb4_bin]
[WARN] [collate.go:373] [“Unable to get collation ID by name, use ID of the default collation instead”] [name=] [“default collation ID”=-46] [“default collation”=utf8mb4_bin]

[Expectation]

  1. Obtain an observable method to locate this issue. Using tshark to capture packets, quer.mysql cannot parse the information.
  2. Fundamentally resolve this warning, either by adjusting business SQL, modifying some application connection configuration parameters, adjusting TiDB parameters, or upgrading this version.
    [Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
    [Attachments: Screenshots/Logs/Monitoring]
| username: Miracle | Original post link

Is there a specified collation when connecting to the service?

| username: TiDBer_hxqv9az1 | Original post link

No collation specified, the connection string specifies two character sets: utf8 and UTF-8.
Tested with 8.0 JDBC alone, no warnings appeared.

| username: Jasper | Original post link

It looks like an unsupported collation was used in TiDB, and it was replaced by the default value utf8mb4_bin. Refer to this link for more details: 字符集和排序规则 | PingCAP 文档中心

| username: zhaokede | Original post link

Check the table creation statement to see what collation is being used.