Sync_diff_inspector comparison error, prompting Illegal mix of collations, but the upstream and downstream collations are the same

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sync_diff_inspector对比报错,提示 Illegal mix of collations,但上下游collation是一样的

| username: dba-kit

The version is 6.1.2, and it is synchronized to the downstream TiDB through DM. An error is reported during the comparison. After manually checking the table structures of the upstream and downstream, they are consistent, both being DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci.
The error log is as follows, with a few peculiar points:

  1. It indicates an Illegal mix of collations (utf8_unicode_ci, IMPLICIT) and (utf8mb4_general_ci, COERCIBLE) for operation ‘>’.
  2. The primary keys are standard letters and numbers, but the boundary values found by the program are all garbled characters.
[2022/11/23 14:27:58.480 +08:00] [WARN] [utils.go:764] ["execute checksum query fail"] [query="SELECT COUNT(*) as CNT, BIT_XOR(CAST(CRC32(CONCAT_WS(',', `broker`, `account3_id`, `payment_method_id`, `create_time`, `update_time`, CONCAT(ISNULL(`broker`), ISNULL(`account3_id`), ISNULL(`payment_method_id`), ISNULL(`create_time`), ISNULL(`update_time`))))AS UNSIGNED)) as CHECKSUM FROM `db`.`trade_bank_card` WHERE (((`broker` < ?) OR (`broker` = ? AND `account3_id` < ?) OR (`broker` = ? AND `account3_id` = ? AND `payment_method_id` <= ?)) AND (TRUE));"] [args="[\"\\u000e)\\u000e)\\u000e)\\u000e1\",\"\\u000e)\\u000e)\\u000e)\\u000e1\",\"\\u000e*\\u000e,\\u000e,\\u000e1\\u000e,\\u000e.\\u000e1\",\"\\u000e)\\u000e)\\u000e)\\u000e1\",\"\\u000e*\\u000e,\\u000e,\\u000e1\\u000e,\\u000e.\\u000e1\",\"\\u000e3\\u000f.\\u000f\\u0010\\u000e+\\u000f.\\u000em\\u000em\\u000e)\\u000e\\ufffd\\u000e\\ufffd\\u000e0\\u0010^\"]"] [error="Error 1267: Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '<='"]
[2022/11/23 14:27:58.480 +08:00] [WARN] [utils.go:764] ["execute checksum query fail"] [query="SELECT COUNT(*) as CNT, BIT_XOR(CAST(CRC32(CONCAT_WS(',', `broker`, `account3_id`, `payment_method_id`, `create_time`, `update_time`, CONCAT(ISNULL(`broker`), ISNULL(`account3_id`), ISNULL(`payment_method_id`), ISNULL(`create_time`), ISNULL(`update_time`))))AS UNSIGNED)) as CHECKSUM FROM `db`.`trade_bank_card` WHERE (((`broker` = ?) AND ((`account3_id` > ?) OR (`account3_id` = ? AND `payment_method_id` > ?)) AND ((`account3_id` < ?) OR (`account3_id` = ? AND `payment_method_id` <= ?))) AND (TRUE));"] [args="[\"\\u000e)\\u000e)\\u000e)\\u000e1\",\"\\u000e+\\u000e+\\u000e,\\u000e1\\u000e1\\u000e+\\u000e0\",\"\\u000e+\\u000e+\\u000e,\\u000e1\\u000e1\\u000e+\\u000e0\",\"\\u000e`\\u000e0\\u000e,\\u000f\\ufffd\\u0010Q\\u0010Z\\u000e,\\u0010\\u0002\\u000e2\\u000e/\\u0010j\\u000f\\ufffd\",\"\\u000e+\\u000e-\\u000e1\\u000e)\\u000e-\\u000e*\\u000e-\",\"\\u000e+\\u000e-\\u000e1\\u000e)\\u000e-\\u000e*\\u000e-\",\"\\u000e`\\u0010\\u001f\\u000e/\\u000f\\ufffd\\u000e\\ufffd\\u000em\\u0010D\\u000e,\\u0010D\\u000f[\\u0010\\u0002\\u000e\\ufffd\"]"] [error="Error 1267: Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '>'"]
[2022/11/23 14:27:58.480 +08:00] [WARN] [utils.go:764] ["execute checksum query fail"] [query="SELECT COUNT(*) as CNT, BIT_XOR(CAST(CRC32(CONCAT_WS(',', `broker`, `account3_id`, `payment_method_id`, `create_time`, `update_time`, CONCAT(ISNULL(`broker`), ISNULL(`account3_id`), ISNULL(`payment_method_id`), ISNULL(`create_time`), ISNULL(`update_time`))))AS UNSIGNED)) as CHECKSUM FROM `db`.`trade_bank_card` WHERE (((`broker` = ?) AND ((`account3_id` > ?) OR (`account3_id` = ? AND `payment_method_id` > ?)) AND ((`account3_id` < ?) OR (`account3_id` = ? AND `payment_method_id` <= ?))) AND (TRUE));"] [args="[\"\\u000e)\\u000e)\\u000e)\\u000e1\",\"\\u000e*\\u000e,\\u000e,\\u000e1\\u000e,\\u000e.\\u000e1\",\"\\u000e*\\u000e,\\u000e,\\u000e1\\u000e,\\u000e.\\u000e1\",\"\\u000e3\\u000f.\\u000f\\u0010\\u000e+\\u000f.\\u000em\\u000em\\u000e)\\u000e\\ufffd\\u000e\\ufffd\\u000e0\\u0010^\",\"\\u000e*\\u000e.\\u000e1\\u000e0\\u000e+\\u000e0\\u000e1\",\"\\u000e*\\u000e.\\u000e1\\u000e0\\u000e+\\u000e0\\u000e1\",\"\\u000eJ\\u000e-\\u000e.\\u000e\\ufffd\\u000f.\\u000e3\\u000e\\ufffd\\u000e1\\u000e3\\u000e-\\u0010\\u0002\\u000e-\"]"] [error="Error 1267: Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '>'"]
[2022/11/23 14:27:58.488 +08:00] [WARN] [utils.go:764] ["execute checksum query fail"] [query="SELECT COUNT(*) as CNT, BIT_XOR(CAST(CRC32(CONCAT_WS(',', `broker`, `account3_id`, `payment_method_id`, `create_time`, `update_time`, CONCAT(ISNULL(`broker`), ISNULL(`account3_id`), ISNULL(`payment_method_id`), ISNULL(`create_time`), ISNULL(`update_time`))))AS UNSIGNED)) as CHECKSUM FROM `db`.`trade_bank_card` WHERE (((`broker` = ?) AND ((`account3_id` > ?) OR (`account3_id` = ? AND `payment_method_id` > ?)) AND ((`account3_id` < ?) OR (`account3_id` = ? AND `payment_method_id` <= ?))) AND (TRUE));"] [args="[\"\\u000e)\\u000e)\\u000e)\\u000e1\",\"\\u000e+\\u000e+\\u000e,\\u000e1\\u000e1\\u000e+\\u000e0\",\"\\u000e+\\u000e+\\u000e,\\u000e1\\u000e1\\u000e+\\u000e0\",\"\\u000e`\\u000e0\\u000e,\\u000f\\ufffd\\u0010Q\\u0010Z\\u000e,\\u0010\\u0002\\u000e2\\u000e/\\u0010j\\u000f\\ufffd\",\"\\u000e+\\u000e-\\u000e1\\u000e)\\u000e-\\u000e*\\u000e-\",\"\\u000e+\\u000e-\\u000e1\\u000e)\\u000e-\\u000e*\\u000e-\",\"\\u000e`\\u0010\\u001f\\u000e/\\u000f\\ufffd\\u000e\\ufffd\\u000em\\u0010D\\u000e,\\u0010D\\u000f[\\u0010\\u0002\\u000e\\ufffd\"]"] [error="Error 1105: runtime error: index out of range [24] with length 24"]
| username: WalterWj | Original post link

admin check and try

| username: 海石花47 | Original post link

Could you share the table structure? Did you ignore fields like text and blob?

| username: dba-kit | Original post link

The table structure is very simple, with only three varchar fields and two datetime fields. Both sides have been compared, and they are the same.

| username: dba-kit | Original post link

I also checked this, and there was no error message. I manually dumped the table, and there was no error either.

| username: dba-kit | Original post link

Could it be related to the default activation of new_collations_enabled_on_first_bootstrap in version 6.1?

| username: 海石花47 | Original post link

Check the table status to see if the collections on both sides are the same. Are they both utf8_unicode_ci?

| username: dba-kit | Original post link

The table structures on both sides are the same, and the collate settings are also the same.

| username: Leavrth | Original post link

Execute this SQL in the downstream database specified in the sync-diff-inspector configuration file to see if there are any anomalies:
SHOW STATS_BUCKETS WHERE db_name= ‘…’ AND table_name= ‘…’;

| username: dba-kit | Original post link

It is indeed quite strange; it is a garbled segmentation.

| username: dba-kit | Original post link

May I ask, which system table is this query targeting? Could it be an issue with the character set of the system table?

| username: Leavrth | Original post link

In mysql.stats_buckets

| username: dba-kit | Original post link

I checked that the character set of this table is utf8mb4_bin. So, if the boundaries of buckets from other character sets are written into it, will the character set then become utf8mb4_bin?

| username: dba-kit | Original post link

Created a new table with the same structure using utf8mb4, and after importing the data, I found that the values in this table were normal and contained the correct numbers.

| username: Leavrth | Original post link

Well, future versions will attempt to fix this issue.

| username: dba-kit | Original post link

Hmm, is there a way to prevent it from reading this table now? I remember that if both sides are MySQL, the chunk boundaries will be calculated based on random numbers.

| username: Leavrth | Original post link

Yes, the current workaround is to force this table to use a non-existent index in the sync-diff configuration, for example:

######################### Task config #########################
[task]
        output-dir = "./output"

        source-instances = ["mysql1"]

        target-instance = "tidb0"

        target-check-tables = ["schema*.table*", "!c.*", "test2.t2"]
        target-configs = ["config1"]

######################### Table config #########################
[table-configs.config1]
target-tables = ["schema*.test*", "test2.t2"]  # Specify the problematic table
index-fields = ["broker"]
| username: dba-kit | Original post link

Uh, according to the instructions, setting index-fields = [“idx_not_exist”] reports that the column does not exist. Do I have to set index_fields for hundreds of tables separately?

| username: dba-kit | Original post link

“[“failed to build bucket iterator, fall back to use random iterator”] [error=“no index to split buckets not found”]”
However, according to this method, it is indeed possible to check a single table. Is there a simpler way to set it for all tables?

| username: Leavrth | Original post link

I checked it, and it is available. Just replace index-fields with range.

######################### Task config #########################
[task]
        output-dir = "./output"

        source-instances = ["mysql1"]

        target-instance = "tidb0"

        target-check-tables = ["schema*.table*", "!c.*", "test2.t2"]
        target-configs = ["config1"]

######################### Table config #########################
[table-configs.config1]
target-tables = ["schema*.test*", "test2.t2"]  # Specify the problematic tables
range = "true and true"