Add Configuration for Specifying Upstream Table Data Range in sync_diff Comparison Tool

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sync_diff比对工具增加指定上游表数据范围配置

| username: h5n1

Currently, it supports setting the comparison range of the target table. Can it support setting the comparison range of the source data?

| username: WalterWj | Original post link

I understand that this range is effective both upstream and downstream, right?

| username: h5n1 | Original post link

I see the described parameter is target-tables, can this affect the confirmation of upstream and downstream?

| username: xfworld | Original post link

If that’s the case, it’s best to configure them separately, otherwise there will be ambiguity.

  • source
  • target
| username: WalterWj | Original post link

The principle of sync-diff is to compare whether the data of the upstream A table and the downstream A table are consistent.
Normally, it will split batches, and the core is:

set snapshot = '2024xxx';
select xor() from t where row_id between and;

If you add a range, it becomes:
select xor() from t where row_id between and; changes to select xor() from t where row_id between and and range between;

You don’t need to worry about rowid splitting.

| username: WalterWj | Original post link

If the upstream and downstream conditions are the same, there definitely won’t be any inconsistency. It’s just that this parameter should be placed in the global config rather than the target configuration.

| username: h5n1 | Original post link

Tested it, and it is indeed effective for both upstream and downstream.