Issues with sync_differ_inspector

translator_bot · June 23, 2024, 3:41am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sync_differ_inspector问题

| username: h5n1

[Environment]
Source: v5.2.3, table with 200 million rows, approximately 167 varchar(100) fields, only 1 FILEDATE is an int field
Target: v6.1.1, empty table
sync_diff_inspector: v2.0
Table index:
TABLE_NAME varchar(100) NOT NULL,
FILE_NAME varchar(100) DEFAULT NULL,
FILEDATE int(11) NOT NULL,
PRIMARY KEY (TABLE_NAME,FILEDATE) /*T![clustered_index] NONCLUSTERED */,
KEY INX_USAIMS_FILENAME_20220913 (FILENAME),
KEY INX_USAIMS_CALLNUMBER_20220913 (CALLINGNUMBER,CALLEDNUMBER)
)
PARTITION BY LIST (FILEDATE)
(PARTITION P20220901 VALUES IN (20220901),
PARTITION P20220902 VALUES IN (20220902),
Configuration file parameters:
check-thread-count = 4
chunk-size = 10000
[Process]
Using one table for sync_diff check, using show processlist to check the executed SQL, only one full table scan:
select xxx as CHECKSUM FROM jiesuan.T_GIMS_USAGE_13_202209 WHERE ((TRUE) AND (TRUE))

[Questions]

From the show processlist results, it seems that chunks are not divided. Is chunk division only possible with single-column int type primary keys or unique indexes?
If chunks are not divided and a full table scan is performed, does the TiDB server store the entire table data in memory and release it only after the full table comparison is completed, or does it release part of the memory after comparing part of the data?
Does the check-thread-count concurrency apply to the number of threads for a single table or the number of threads for the entire database (e.g., checking a batch of tables through regex)? Why is the upstream slightly larger than this value, and by approximately how much?
check-thread-count # Number of threads for checking data, the number of connections to the upstream and downstream databases will be slightly larger than this value
After running for a while, the TiDB memory suddenly increases, causing OOM. Why does it suddenly increase? (Tried twice, the sudden increase occurred around 15 minutes)

image451×544 8.37 KB

translator_bot · June 23, 2024, 3:41am

| username: jansu-dev | Original post link

From the show processlist results, there is no chunk division. Is chunk range division only possible with a single-column Int type primary key or unique key index?
No.

image731×299 33.1 KB

There is an index, and the chunk size is divided according to this rule.
If chunks cannot be divided, after a full table scan, does the tidb server store the entire table data in memory, and does the comparison need to be fully completed before memory can be released, or can it release part of the memory after comparing part of the data?
The sync-diff comparison principle is to compare the crc32(row_values) of the source and target ends. Firstly, sync_diff is considered an external application to tidb. It sends SQL queries, and if everything is within one SQL query, plus calculating the crc32 result value, it probably won’t release memory (I guess).

translator_bot · June 23, 2024, 3:41am

| username: jansu-dev | Original post link

Does the concurrency scope of check-thread-count apply to the number of threads for a single table or the entire database (for example, validating a batch of tables through regex)? Why is the upstream slightly larger than this value, and by approximately how much?
This is the global producer of sync-diff (similar to a thread pool), and it is not necessarily that the upstream is higher than the downstream. All tasks will be distributed into this thread pool.
After running for a period of time, the memory usage of TiDB suddenly increases, leading to OOM. Why does it suddenly increase? (It happened twice, approximately 15 minutes into the run)
To diagnose OOM, you need to capture the TiDB profile at the point of high memory consumption to confirm the cause. It might be related to sync-diff (possible only).

translator_bot · June 23, 2024, 3:41am

| username: h5n1 | Original post link

The table has a composite primary key PRIMARY KEY (TABLE_NAME, FILEDATE).
It ran out of memory (OOM) before I could profile it.

translator_bot · June 23, 2024, 3:41am

| username: jansu-dev | Original post link

It seems that sync diff is still quite clumsy in handling composite primary keys.

image1179×183 39.6 KB
Is it a production environment? If there is nothing else running, only sync-diff, then it should be caused by this. If it really doesn’t work, try running it in ranges!

translator_bot · June 23, 2024, 3:41am

| username: h5n1 | Original post link

There are quite a few tables. I did CDC and want to verify data consistency. I think adding an index to the “filedate” integer column might work.

translator_bot · June 23, 2024, 3:41am

| username: jansu-dev | Original post link

Oh, you can test it in the testing environment.