Issue with Overlength VARCHAR

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: varchar超长问题

| username: Hacker_6ASfgBFe

[TiDB Usage Environment] Testing
[TiDB Version] 5.4
Background: A new table with the same structure and character set as the original table is created. When importing business data from the original table into the new table, an error occurs: ERROR 1406 (22001): Data Too Long, field len 225, data len 336.
Upon investigation, it was found that the issue is caused by a field of type varchar(225). I would like to ask, how was the data inserted into the original table?

| username: Jellybean | Original post link

In version 5.4, the space occupied by VARCHAR must not exceed 65535 bytes. When choosing the length of VARCHAR, it should be determined based on the size of the longest row and the character set used.

Under utf8mb4, the maximum column length range for VARCHAR is (0, 16383].

| username: Jellybean | Original post link

Confirm the table structure of the new table and the original table, and check whether the definitions of these two fields are consistent.

| username: Hacker_6ASfgBFe | Original post link

Exactly the same, copy-pasted. The original table contains data from business operations. I need to import the data into a new table, but it won’t go in.

| username: changpeng75 | Original post link

Check if there is indeed data with a length of 336.

| username: Jellybean | Original post link

This issue generally occurs because the actual data length does not match the defined length in the table structure. Since you are in a testing environment, try inserting the same field data into the original table to see if you can reproduce the problem. You can also provide the table and data after desensitization for everyone to take a look.

| username: Hacker_6ASfgBFe | Original post link

Indeed, there is.

| username: Hacker_6ASfgBFe | Original post link

Taking out that particular piece of data alone, it cannot be inserted. Right now, it’s unclear how the original table was able to insert it. Is there some setting, like sql_mode?

| username: changpeng75 | Original post link

Then we can only increase the character length of the target database.
There are indeed similar cases, and it’s not caused by the character set.

| username: Hacker_6ASfgBFe | Original post link

What is the reason for not being able to find it?

| username: changpeng75 | Original post link

It seems to be an old version; the case I saw was 4.0.
It should be unrelated to SQL_MODE. Even in non-strict mode, theoretically, it should truncate the data rather than store the excessively long data.

| username: Jellybean | Original post link

The sql_mode generally checks whether the data meets the requirements. If set, it will report an error when the data does not meet the requirements; if not set, it will print a warning without reporting an error, but the data will be truncated and inserted. For example:

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v5.4.0 |
+--------------------+
1 row in set (0.00 sec)
mysql> show variables like '%sql_mode%';
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                                                                     |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| sql_mode      | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> create table a (name varchar(3));
Query OK, 0 rows affected (0.07 sec)
mysql> insert into a (name) values ('1234');
ERROR 1406 (22001): Data too long for column 'name' at row 1
mysql>
mysql> insert into a (name) values ('123');
Query OK, 1 row affected (0.01 sec)

mysql> set sql_mode="";
Query OK, 0 rows affected (0.00 sec)
mysql> insert into a (name) values ('1234');
Query OK, 1 row affected, 1 warning (0.01 sec)
mysql> show warnings;
+---------+------+----------------------------------------+
| Level   | Code | Message                                |
+---------+------+----------------------------------------+
| Warning | 1406 | Data Too Long, field len 3, data len 4 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)

mysql> select * from a ;
+------+
| name |
+------+
| 123  |
| 123  |
+------+
2 rows in set (0.00 sec)

It has nothing to do with sql_mode.

Moreover, through regular access via tidb-server, I have verified that it works normally on my side and I could not reproduce the phenomenon described by the original poster.

| username: Hacker_6ASfgBFe | Original post link

We upgraded from version 4.0 to 5.4.

| username: changpeng75 | Original post link

Was it an in-place upgrade at the beginning?

| username: Kongdom | Original post link

Could you post the old and new table structures? It seems a bit confusing~
Also, could you manually check if the old data is really 336?

| username: xiaoqiao | Original post link

Is it caused by garbled characters? I have encountered a similar issue before where both sides had the same structure, but there were garbled characters during the process. The target end calculated based on the garbled characters, resulting in an overflow.

| username: DBAER | Original post link

Please provide the table structures, sql_mode, and the database versions for both the source and target.

| username: shigp_TIDBER | Original post link

This kind of issue requires confirmation on whether there has been a TiDB version upgrade after the original table data was inserted, and whether there have been any changes to the sql_mode setting?

| username: TiDBer_JUi6UvZm | Original post link

Show the create table statements for both the new and old tables. It seems like they shouldn’t be inconsistent.

| username: zhaokede | Original post link

Let’s see if the stored characters are Chinese characters and if the two tables occupy different amounts of space.