Issue of Inconsistent Table Structure in sync-diff-inspector Data Validation

translator_bot · June 23, 2024, 12:53am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sync-diff-inspector数据校验表结构不一致问题

| username: 海石花47

[TiDB Usage Environment] Production Environment
[TiDB Version] tidb6.1.1, sync_diff_inspector v2.0
[Encountered Problem] sync-diff-inspector data validation table structure inconsistency (upstream MySQL, downstream TiDB)
[Reproduction Path]
Steps:

1. Use dumpling to export the upstream MySQL table structure and data
2. Use lightning to import into TiDB
3. During the first step, the exported table structure automatically added the COLLATE=utf8_bin option
4. Use sync-diff-inspector for data validation, it reports that the table structures of the two tables are inconsistent

There are dozens of tables online waiting for data validation, what should I do??

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

Is no one answering this question…

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

Dear experts, please help me~

translator_bot · June 23, 2024, 12:53am

| username: 裤衩儿飞上天 | Original post link

I think the problem is with the network. You can try to check the network connection between the TiDB server and the PD server.

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

I have tried setting collation to “utf8_bin” or “utf8_general_ci” here, but it doesn’t work. How exactly should this parameter be used? Also, I have a question: it prompts that the table structure is inconsistent, but it doesn’t indicate where the inconsistency is, nor does it provide any hints or log output?

translator_bot · June 23, 2024, 12:53am

| username: 裤衩儿飞上天 | Original post link

Check out the official documentation
sync-diff-inspector User Documentation | PingCAP Docs

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

I’ve read it, read it several times… If you don’t believe me, find it yourself. Where will it print if the table structure is different?

translator_bot · June 23, 2024, 12:53am

| username: onlyacat | Original post link

I previously focused on the issue of data verification, so I can share some insights.

The error is due to the fact that this tool performs sharding and then comparison.

Sharding inevitably involves sorting and then splitting, but if the collation is inconsistent, it will lead to inconsistencies in the way the data is split.

One solution is to explicitly specify the collation during the select operation, but this will use external sorting, which greatly reduces efficiency.

In the end, we did not use this sync-diff tool and switched to another data verification tool.

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

Which tool did you end up using?

translator_bot · June 23, 2024, 12:53am

| username: onlyacat | Original post link

A tool provided by an upstream vendor is currently not publicly available, and we have made modifications to it.

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

Is this the problem you mentioned? Changing the collation doesn’t work, and adding the collate you mentioned also doesn’t work. What should I do?

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

I followed the official documentation and changed the collation to the collation of the upstream MySQL table (utf8_general_ci), and then I got this error… Seeking expert advice

translator_bot · June 23, 2024, 12:53am

| username: jansu-dev | Original post link

It looks like the parameters given by 裤衩儿飞上天 have taken effect because the error has changed. → diff: add collation config by WangXiangUSTC · Pull Request #75 · pingcap/tidb-tools · GitHub
The count is not correct. It should be that there is an issue with the downstream data, right? For example, there might be inconsistent index data, resulting in the count obtained through the index not matching the actual data volume.

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

Here it is:

The issue I initially reported was later found to be due to inconsistent table structures between upstream and downstream. The primary database didn’t have an index, while the secondary database did. Synchronization followed the primary, and verification followed the secondary. Therefore, the initial troubleshooting was incorrect. The error was not due to collation but was caused by the missing index.

The error screenshot I posted later was indeed due to incorrect chunk calculation, leading to synchronization failure.

Here are my solutions:

Review the table structures involved in the verification and ignore all text, blob, and json formats. This is likely to be useful.
Adjust the chunk_size; sometimes it helps.
Do not set the verification rules for the table individually. Let the tool scan the entire table; sometimes this works.

translator_bot · June 23, 2024, 12:53am

| username: jansu-dev | Original post link

The issue I initially reported was later found to be due to inconsistent table structures between upstream and downstream. The primary database had no index, while the secondary database had an index. Synchronization followed the primary, and verification followed the secondary. Therefore, the initial troubleshooting was incorrect. The error was not due to collation but was caused by the missing index.
→ The explanation is problematic. “Using sync-diff-inspector for data verification, it indicates that the table structures of the two tables are inconsistent” should indeed point out the collation difference. The workaround collate should be the solution to this problem. (The first cause and effect do not correspond)
The error screenshot I posted later was indeed due to incorrect chunk calculation, leading to synchronization failure.
→ Incorrect chunk calculation indicates data inconsistency. Sync-diff is based on indexes to divide chunks, and then both upstream and downstream simultaneously take the CDC32 calculation results within the chunk. In other words, if the chunk is incorrect, it means data inconsistency (it could also be an issue with chunk division between upstream and downstream). (Incorrect chunk calculation cannot be considered the root cause, only a phenomenon)
Check the table structures involved in the verification and ignore all text, blob, and json formats. This is likely useful.
→ This is likely useful, but it’s unclear which step is being bypassed to achieve this usefulness.
Adjust the chunk_size; sometimes it helps.
→ For small tables, it might directly perform a full table scan without using indexes to distinguish chunks, which might feel somewhat similar to the following point. (Personal guess)
Do not set the verification rules for this table separately; let the tool perform a full table scan operation. Sometimes it helps.
→ The phenomenon and underlying principle are also unclear.

However, I think there might be inconsistent index data in your downstream. I suggest performing an admin check table to avoid future production issues.

translator_bot · June 23, 2024, 12:53am

| username: 海石花47 | Original post link

Thank you for the reply~ The issue I mentioned and the measures I took are aimed at situations involving many databases and tables, not just a single database or table. At least for now, the daily detection tasks have stabilized, and there are no other issues.

The official recommendation is also to ignore certain types of columns:

translator_bot · June 23, 2024, 12:53am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.