When using Data Migration in TIDBv6.5 for full + Binlog real-time synchronization, the downstream TiDB database fields have garbled characters

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用TIDBv6.5的 Data Migration进行全量 + Binlog 实时同步时下游tidb数据库字段有乱码

| username: TiDBer_BejuqEuL

[TiDB Usage Environment] Testing
[TiDB Version] v6.5.3
[Encountered Issue: Problem Description and Impact]
Using TiDB’s Data Migration for full + Binlog real-time synchronization, the upstream is a MySQL database with table character set encoding as GBK, and the downstream is TiDB with character set encoding as UTF8MB4. During full synchronization, I noticed garbled text in the downstream table. I resolved the garbled text issue by setting the parameters character_set_client=utf8mb4, character_set_connection=utf8mb4, and character_set_results=utf8mb4 in mydumpers.


However, during the incremental sync phase after the synchronization task, the data generated upstream and synchronized downstream appeared garbled. I searched the documentation but did not find any configuration items that could solve the garbled text issue. I would like to ask if there is any way to resolve the garbled text issue caused by inconsistent character set encodings between upstream and downstream during the incremental sync phase? During the full dump, I added character_set_connection=utf8mb4 and there was no garbled text downstream. Is there a similar parameter that can be configured for incremental synchronization?

| username: ShawnYan | Original post link

Regarding the character set issue, can we create an intermediate database to first convert the character set to utf8mb4, and then migrate to TiDB?

| username: redgame | Original post link

The person above is right. Alternatively, you can try modifying the character set encoding of the downstream TiDB instance to match the encoding of the upstream database. If you don’t change it, there won’t be any issues.

| username: TiDBer_vfJBUcxl | Original post link

It could also be an issue with the client character set.