ERROR 1366 (HY000): incorrect utf8 value f09f8c80(πŸŒ€) for column a

In TiDB v2.1.1 and earlier versions, if the charset is UTF-8, there is no UTF-8 Unicode encoding check on the inserted 4-byte data. But in v2.1.2 and the later versions, this check is added.

  • Before upgrading, the following operations are executed in v2.1.1 and earlier versions.

    create table t(a varchar(100) charset utf8);
    
    Query OK, 0 rows affected
    
    insert t values (unhex('f09f8c80'));
    
    Query OK, 1 row affected
    
  • After upgrading, the following error is reported in v2.1.2 and the later versions.

    insert t values (unhex('f09f8c80'));
    
    ERROR 1366 (HY000): incorrect utf8 value f09f8c80(πŸŒ€) for column a
    

Solution:

  • In v2.1.2: this version does not support modifying the column charset, so you have to skip the UTF-8 check.

    set @@session.tidb_skip_utf8_check=1;
    
    Query OK, 0 rows affected
    
    insert t values (unhex('f09f8c80'));
    
    Query OK, 1 row affected
    
  • In v2.1.3 and the later versions: it is recommended to modify the column charset into UTF8MB4. Or you can set tidb_skip_utf8_check to skip the UTF-8 check. But if you skip the check, you might fail to replicate data from TiDB to MySQL because MySQL executes the check.

    alter table t change column a a varchar(100) character set utf8mb4;
    
    Query OK, 0 rows affected
    
    insert t values (unhex('f09f8c80'));
    
    Query OK, 1 row affected
    

    Specifically, you can use the variable tidb_skip_utf8_check to skip the legal UTF-8 and UTF8MB4 check on the data. But if you skip the check, you might fail to replicate the data from TiDB to MySQL because MySQL executes the check.

    If you only want to skip the UTF-8 check, you can set tidb_check_mb4_value_in_utf8. This variable is added to the config.toml file in v2.1.3, and you can modify check-mb4-value-in-utf8 in the configuration file and then restart the cluster to enable it.

    Starting from v2.1.5, you can set tidb_check_mb4_value_in_utf8 through the HTTP API and the session variable:

    • HTTP API(the HTTP API can be enabled only on a single serverοΌ‰

      • To enable HTTP API:

        curl -X POST -d "check_mb4_value_in_utf8=1" http://{TiDBIP}:10080/settings
        
      • To disable HTTP API:

        curl -X POST -d "check_mb4_value_in_utf8=0" http://{TiDBIP}:10080/settings
        
    • Session variable

      • To enable session variable:

        set @@session.tidb_check_mb4_value_in_utf8 = 1;
        
      • To disable session variable:

        set @@session.tidb_check_mb4_value_in_utf8 = 0;