Chinese garbled characters appear when migrating data from MySQL to TiDB using the DM cluster in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB使用DM集群从MYSQL迁移数据到TIDB,出现中文乱码

| username: 每天当牛马

Due to the fact that the Chinese data in the existing MYSQL is encoded in Latin1, while in TIDB it is encoded in UTF8, using DM to migrate data from MYSQL to TIDB results in garbled Chinese characters. Can the DM migration task be specified to export with Latin1 character encoding for migration? Previously, using dumpling, the following command could be executed to export normal Chinese:
/dumpling -h 127.0.0.1 -P 13306 -u root -p *#^ippbx^#* -t 16 -F 256MB -B core -o /home/tidb_cjx/tidb-community-toolkit-v7.5.0-linux-amd64/cjx --params "character_set_client=latin1,character_set_connection=latin1,character_set_results=Latin1,character_set_server=Latin1"

| username: wangccsy | Original post link

Character set issues are quite troublesome.

| username: 哈喽沃德 | Original post link

If it really doesn’t work, use ETL.

| username: 每天当牛马 | Original post link

What is this?

| username: 哈喽沃德 | Original post link

Database extraction tool

| username: 每天当牛马 | Original post link

Are there any solutions? Where can I view the data exported by DM in full?

| username: 小龙虾爱大龙虾 | Original post link

If there are no issues with the incremental updates, then just do a manual full update.

| username: dba-kit | Original post link

You can check DM 任务完整配置文件介绍 | PingCAP 文档中心 and configure these variables. This configuration is used when writing to TiDB after parsing the binlog.

| username: dba-kit | Original post link

It seems that it is not yet possible to specify parameters for the export phase. Although the official documentation states that export parameters can be specified, I remember testing it in version 6.X and adding --params did not take effect. You can try version 7.5 to see if it works now. However, you can manually export and import the full data, and use the target-database.session parameter mentioned above for incremental data.

| username: oceanzhang | Original post link

Is it not possible to specify the migration task?

| username: dba-kit | Original post link

I searched the source code: The command line parameters supported by extra-args are only these few, and indeed do not include the --params parameter.

| username: 每天当牛马 | Original post link

Are there any other methods?

| username: 每天当牛马 | Original post link

Where is the full export data stored in DM? I see that by default it should be in ./dumped_data, but I didn’t see this folder generated during migration.

| username: dba-kit | Original post link

This relative path refers to the deploy-dir directory relative to the worker node (yes, you read that right, it’s deploy-dir, not data-dir).

| username: 每天当牛马 | Original post link

Is it under this deploy? I didn’t see any files generated.

| username: dba-kit | Original post link

If you go into the worker where the task is located, you will be able to see it.

| username: dba远航 | Original post link

Automatic character set conversion requires support from migration tools; otherwise, it will result in garbled text. It can only be said that DM still needs improvement.

| username: TIDB-Learner | Original post link

Is yours UTF8 or UTF8mb4? If it’s UTF8, changing it to UTF8mb4 might be better.

| username: 路在何chu | Original post link

Is the garbled text issue due to your client settings?

| username: wangccsy | Original post link

Character rest processing.