Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: sync_diff比对工具增加指定上游表数据范围配置
Currently, it supports setting the comparison range of the target table. Can it support setting the comparison range of the source data?
I understand that this range is effective both upstream and downstream, right?
I see the described parameter is target-tables, can this affect the confirmation of upstream and downstream?
If that’s the case, it’s best to configure them separately, otherwise there will be ambiguity.
The principle of sync-diff is to compare whether the data of the upstream A table and the downstream A table are consistent.
Normally, it will split batches, and the core is:
set snapshot = '2024xxx';
select xor() from t where row_id between and;
If you add a range, it becomes:
select xor() from t where row_id between and; changes to select xor() from t where row_id between and and range between;
You don’t need to worry about rowid splitting.
If the upstream and downstream conditions are the same, there definitely won’t be any inconsistency. It’s just that this parameter should be placed in the global config rather than the target configuration.
Tested it, and it is indeed effective for both upstream and downstream.