Dumpling

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dumpling

| username: 胡杨树旁

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]


When exporting data with dumpling, conditions were added, but the number of rows in the logs during export does not match the number of rows queried in the database.

image

| username: okenJiang | Original post link

Could you please sanitize it and share your SQL statement?

| username: 胡杨树旁 | Original post link

/home/tidb/dumpling -h*** -uroot -p**** -P4000 --filetype sql --logfile /backup/test/1.log -T test.t1 -o /backup/test -t 6 -r 4000000 --where “id=‘AE3133021000020141’”

| username: Tank001 | Original post link

It’s a bit unsightly, is the ID not working?

| username: 考试没答案 | Original post link

Could it be a character set issue???
image
Different encodings lead to different matched data.

| username: 胡杨树旁 | Original post link

The exported data was restored, and the data volume with id=‘AE3133021000020141’ matches the data volume queried from the database. The total table data volume is over 70 million, and the data volume that meets the conditions is over 10 million.

| username: 胡杨树旁 | Original post link

I feel that this possibility is relatively small. I exported it once yesterday, but the condition was a combination of two conditions. The log shows that the amount of data exported is consistent with the amount of data queried from the database.

| username: 考试没答案 | Original post link

Create a temporary database. The data from Dumpling can be manually imported. Import it and then check the queries.

| username: 考试没答案 | Original post link

Please provide the table structure and the select query. Ensure to anonymize the data. The main focus is to examine the encoding rule issue.

| username: 我是咖啡哥 | Original post link

Could it be that this data is estimated based on statistical information or something similar?

| username: 胡杨树旁 | Original post link

(root@localhost)[test] 10:03:41>select count() from vsunusedmark;----
±---------+
| count(
) |
±---------+
| 44124732 |
±---------+
1 row in set (9.35 sec)

(root@localhost)[test] 10:04:02>show create table vsunusedmark\G
*************************** 1. row ***************************
Table: vsunusedmark
Create Table: CREATE TABLE vsunusedmark (
VISAPRE varchar(20) COLLATE utf8mb4_bin DEFAULT NULL,
VISASERIALNO varchar(100) COLLATE utf8mb4_bin NOT NULL,
VISACODE varchar(40) COLLATE utf8mb4_bin NOT NULL,
VISANAME varchar(200) COLLATE utf8mb4_bin NOT NULL,
PRESSBATCHNO varchar(32) COLLATE utf8mb4_bin NOT NULL,
BUSINESSNO varchar(100) COLLATE utf8mb4_bin DEFAULT NULL,
VISAAMOUNT decimal(15,3) DEFAULT NULL,
USEDATE datetime NOT NULL,
USERTYPE varchar(2) COLLATE utf8mb4_bin NOT NULL,
USERCODE varchar(40) COLLATE utf8mb4_bin NOT NULL,
USERNAME varchar(240) COLLATE utf8mb4_bin NOT NULL,
VISASTATUS varchar(40) COLLATE utf8mb4_bin NOT NULL,
PROVIDETIMES decimal(15,0) DEFAULT NULL,
RECYCLETIMES decimal(15,0) DEFAULT NULL,
VERIFIEDCANCELFLAG varchar(2) COLLATE utf8mb4_bin NOT NULL,
BEFORESTATUS varchar(40) COLLATE utf8mb4_bin DEFAULT NULL,
REMARK varchar(510) COLLATE utf8mb4_bin DEFAULT NULL,
FLAG varchar(20) COLLATE utf8mb4_bin DEFAULT NULL,
COMCODE varchar(40) COLLATE utf8mb4_bin DEFAULT NULL,
USEDENDDATE datetime DEFAULT NULL,
GDVISACODE varchar(510) COLLATE utf8mb4_bin DEFAULT NULL,
GDVISASERIALNO varchar(510) COLLATE utf8mb4_bin DEFAULT NULL,
USEDTIME datetime DEFAULT NULL COMMENT ‘Used Time’,
INVALIDTIME datetime DEFAULT NULL COMMENT ‘Invalid Time’,
COLLARTIME datetime DEFAULT NULL COMMENT ‘Collection Time’,
LOSTTIME datetime DEFAULT NULL COMMENT ‘Lost Time’,
UPDATETIME datetime DEFAULT NULL COMMENT ‘Last Update Time’,
DESTROYTIME datetime DEFAULT NULL COMMENT ‘Destruction Time’,
ATTRIBUTIVECOMCODE varchar(40) COLLATE utf8mb4_bin DEFAULT NULL COMMENT ‘Attributive Branch Code’,
PROVIDECOMTIME datetime DEFAULT NULL COMMENT ‘Department Distribution Time’,
TRANSSTATUSTIME datetime DEFAULT NULL COMMENT ‘Status Transition Time’,
PRIMARY KEY (VISACODE,VISASERIALNO),
KEY IDX_VSUNUSEDMARK_BUSINESSNO (BUSINESSNO),
KEY IDX_VSUNUSEDMARK_CODE_STAT (VISACODE,VISASTATUS),
KEY IDX_VSUNUSEDMARK_COMCODE (COMCODE),
KEY IDX_VSUNUSEDMARK_UCODE_VSTAT (VISASTATUS,USERCODE),
KEY IDX_VSUNUSEDMARK_UPDATETIME (UPDATETIME),
KEY IDX_VSUNUSEDMARK_USERCODE (USERCODE),
KEY IDX_VSUNUSEDMARK_VISASERIALNO (VISASERIALNO),
KEY IDX_VSUNUSEDMARK_VISA_USR (VISACODE,USERCODE),
KEY IDX_VSUNUSED_VCSN (VISASERIALNO,VISACODE),
KEY IDX_VSUNUSED_COMCODE_VISACODE (COMCODE,VISACODE),
KEY idx_vsunusedmark_acomcode (ATTRIBUTIVECOMCODE)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

| username: 胡杨树旁 | Original post link

I just created a temporary database to restore the exported data, and the amount of restored data matches the amount shown in the export logs.

| username: 考试没答案 | Original post link

Is the value 11136394 when you check the export database with the following dumpling where condition?

select count(1) from vsunusedmark where VISACODE = 'AE3133021000020141'
| username: 考试没答案 | Original post link

Could you please list the directory output from dumpling using the ls -al command? Try removing the -r parameter and check again.

| username: 胡杨树旁 | Original post link

Yes, exporting the database with conditions can match the data.

| username: 胡杨树旁 | Original post link

ls -al
total 14410956
drwxr-xr-x 2 root root 150 Feb 7 20:07 .
drwxr-xr-x 14 root root 287 Feb 7 21:07 …
-rw-r–r-- 1 root root 95 Feb 7 17:17 test-schema-create.sql
-rw-r–r-- 1 root root 12792558533 Feb 7 20:05 test.vsunusedmark.000000001.sql
-rw-r–r-- 1 root root 1964246386 Feb 7 20:05 test.vsunusedmark.000000002.sql
-rw-r–r-- 1 root root 1636 Feb 7 17:17 test.vsunusedmark-schema.sql

| username: 考试没答案 | Original post link

Both of these files have been imported, right?

| username: 考试没答案 | Original post link

Please also send a similar screenshot of this so I can take a look.

| username: 胡杨树旁 | Original post link

Yes, everything has been imported.

| username: 胡杨树旁 | Original post link

I’ll try exporting it again at noon. After discovering the issue yesterday, I didn’t add any conditions and exported the entire database once. The data volume matches up. I have deleted the data and logs from the first export where conditions were added.