Does TiFlash really not support tables with the GBK character set, or is it just

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash 真的不支持 gbk 字符集的表么,还是只是。。。

| username: ShawnYan

[TiDB Usage Environment] Poc
[TiDB Version] v7.2.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]


reproduce test case:

  • First create tables, replicate to TiFlash, then add varchar columns, and you can query normally through TiKV/TiFlash.
create table tgbk (id int) charset = gbk;
create table tascii (id int) charset = ascii;
create table tbinary (id int) charset = binary;
create table tlatin1 (id int) charset = latin1;
create table tutf8mb4 (id int) charset = utf8mb4;
create table tutf8 (id int) charset = utf8;
create table t (id int);

alter table t        set tiflash replica 1;
alter table tascii   set tiflash replica 1;
alter table tbinary  set tiflash replica 1;
alter table tgbk     set tiflash replica 1;
alter table tlatin1  set tiflash replica 1;
alter table tutf8    set tiflash replica 1;
alter table tutf8mb4 set tiflash replica 1;

alter table t        set tiflash replica 1;
alter table tascii   set tiflash replica 1;
alter table tbinary  set tiflash replica 1;
alter table tgbk     set tiflash replica 1;
alter table tlatin1  set tiflash replica 1;
alter table tutf8    set tiflash replica 1;
alter table tutf8mb4 set tiflash replica 1;

insert t        select 1;
insert tascii   select 1;
insert tbinary  select 1;
insert tgbk     select 1;
insert tlatin1  select 1;
insert tutf8    select 1;
insert tutf8mb4 select 1;

set @@session.tidb_isolation_read_engines = 'tikv';
-- set @@session.tidb_isolation_read_engines = 'tiflash';

select * from t        ;
select * from tascii   ;
select * from tbinary  ;
select * from tgbk     ;
select * from tlatin1  ;
select * from tutf8    ;
select * from tutf8mb4 ;

set @@session.tidb_isolation_read_engines = 'tikv,tiflash';

alter table t        add column c varchar(1);
alter table tascii   add column c varchar(1);
alter table tbinary  add column c varchar(1);
alter table tgbk     add column c varchar(1);
alter table tlatin1  add column c varchar(1);
alter table tutf8    add column c varchar(1);
alter table tutf8mb4 add column c varchar(1);

update t        set c = 'a';
update tascii   set c = 'a';
update tbinary  set c = 'a';
update tgbk     set c = 'a';
update tlatin1  set c = 'a';
update tutf8    set c = 'a';
update tutf8mb4 set c = 'a';
  • But, first create tables with character types, insert data, then replicate to TiFlash, only the gbk table fails to create TiFlash replicas.
create schema yandb2;
use yandb2;

create table tgbk     (id int, c varchar(10)) charset = gbk;
create table tascii   (id int, c varchar(10)) charset = ascii;
create table tbinary  (id int, c varchar(10)) charset = binary;
create table tlatin1  (id int, c varchar(10)) charset = latin1;
create table tutf8mb4 (id int, c varchar(10)) charset = utf8mb4;
create table tutf8    (id int, c varchar(10)) charset = utf8;
create table t        (id int, c varchar(10));

insert t        select 1,'b';
insert tascii   select 1,'b';
insert tbinary  select 1,'b';
insert tgbk     select 1,'b';
insert tlatin1  select 1,'b';
insert tutf8    select 1,'b';
insert tutf8mb4 select 1,'b';

alter table tascii   set tiflash replica 1;
alter table tbinary  set tiflash replica 1;
alter table tgbk     set tiflash replica 1;
alter table tlatin1  set tiflash replica 1;
alter table tutf8    set tiflash replica 1;
alter table tutf8mb4 set tiflash replica 1;

PTAL~

| username: h5n1 | Original post link

The official documentation is very detailed. You can refer to the following link: TiDB Lightning 简介 | PingCAP 文档中心

| username: zhanggame1 | Original post link

The default value of tidb_enable_clustered_index is INT_ONLY, which means that only tables with integer primary keys will use clustered indexes. If you want to enable clustered indexes for all tables, you need to set tidb_enable_clustered_index to ON.

| username: ShawnYan | Original post link

Hmm…
Look at case1, the table with GBK encoding is already in TiFlash.

| username: h5n1 | Original post link

It seems that the restrictions are not very strict when setting up replicas. Try writing in Chinese and see the query results.

| username: ShawnYan | Original post link

Let me try,

Actually, my confusion is whether the limitations mentioned in this document are for older versions of TiDB. After TiDB started supporting GBK, the document was not updated, and I also couldn’t find where the restrictions were added in the TiFlash code.

| username: cy6301567 | Original post link

We uniformly use CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci

| username: ShawnYan | Original post link

Raise an issue to record this.

| username: windtalker | Original post link

The protocol between TiDB and TiFlash does not allow tables with GBK columns to create replicas, so there is no explicit restriction in the TiFlash code. In your example, first creating a table without string columns and then adding a TiFlash replica to this table did not report an error because the table indeed did not have GBK columns. However, the fact that no error was reported when using ALTER TABLE ADD COLUMN later can be considered as not conforming to the agreement between TiDB and TiFlash. Additionally, this restriction is not specific to older versions; TiFlash still does not support GBK as of now.

| username: TiDBer_vfJBUcxl | Original post link

Tables with the GBK character set cannot be synchronized to TiFlash, and the following error will be reported:

ERROR 8200 (HY000): Unsupported ALTER table replica for table contain gbk charset

This means that if a table uses the GBK character set, the new feature cannot be used.

Currently, the character sets supported by TiFlash are UTF8, UTF8MB4, ASCII, Latin1, Binary.

Therefore, generally, it is not recommended to use the GBK character set when creating tables. If it is a migration or reconstruction project, it is recommended to convert the data to UTF8mb4.

| username: ShawnYan | Original post link

Well, thank you for your answer.
Next, let’s follow up on how to handle this case.

| username: ShawnYan | Original post link

utf8mb4 is great.

By the way, that article was actually reposted from our community column. :stuck_out_tongue_winking_eye:

| username: redgame | Original post link

Bro, it’s true.

| username: TiDBer_vfJBUcxl | Original post link

I saw it, the post in the community column was written by you.

| username: tony5413 | Original post link

The official statement says it is not supported.

| username: cy6301567 | Original post link

It is recommended to standardize to UTF8mb4.

| username: ShawnYan | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.