Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: sync_diff_inspector做上下游数据对比时可能会导致集群性能抖动

[Problem Scenario Involved]
Using CDC for high availability between clusters, when using sync_diff_inspector to perform full data comparison between upstream and downstream clusters, if the large table is not a clustered index table, it may cause cluster performance jitter and affect other businesses.
[Expected Requirement Behavior]
Hope to optimize for non-clustered index tables, such as using the _tidb_rowid pseudo column for splitting (but the basis for chunk division is difficult to determine), or as a fallback, force the use of indexes and use hints to push down aggregation (USE INDEX(xxx), AGG_TO_COP()), try to make the computation happen on TiKV.
[Background Information]
The principle of sync_diff_inspector for table data comparison is to split all data into multiple chunks, calculate the crc32 checksum (i.e., checksum) of all data rows in a chunk each time, and if the checksum of this chunk is consistent, then this batch of records is consistent until the entire table is compared. There are two points here: 1. The basis for splitting is to determine the splitting range according to the Bound of show stats_buckets. 2. The choice of splitting fields is to prioritize the primary key, then the unique index, and then determine which index to use to split the table into multiple chunks according to the value size: COUNT(DISTINCT a)/COUNT(*).
The statement to get the checksum value of the chunk is:
select count(*), bit_xor(crc32(...)) as checksum from t where col1 > 'a' and col1 <='b'
Analyze the following situations:
- When the table has no primary key, the optimizer may choose based on execution cost:
- Full table scan, which may cause a large number of scans on TiKV, occupying a lot of CPU;
- Index back to the table, originally bit_xor, crc32 and other functions can be pushed down to TiKV for execution, because the index back to the table and the optimizer evaluates that the aggregation is not pushed down, it may cause all records of the chunk to be returned to the TiDB server to calculate bit_xor and crc32, mainly a large amount of data transfer may cause grpc congestion between TiKV and TiDB, causing cluster TP business jitter.
- Index back to the table and aggregation occurs at the TiKV layer, this situation is relatively optimal, but the index back to the table query based on _tidb_rowid will cause a large number of seek operations, if the index order and _tidb_rowid order are inconsistent, it cannot be optimized into a next operation, resulting in low efficiency.
-
When the table has a primary key, but the table is a non-clustered index table, then the primary key is still a secondary index, and similar situations to 1 will occur.
-
When the table has a primary key and the table is a clustered index table, sync_diff_inspector will always prioritize the primary key to divide the chunk range, because the primary key is a first-level index, so range scans will always go through table range scan, because there is no index back to the table problem and no full table scan, so not only can the bit_xor, crc32 functions be pushed down to TiKV for computation, but also no full table scan occurs, so the efficiency is optimal.
In summary, clustered index tables are always efficient and have little impact on the system (stable) when performing full table data comparison. Non-clustered index tables may cause cluster performance jitter, so it is recommended to use the _tidb_rowid pseudo column for splitting for non-clustered index tables, but the choice of Bound is another challenge (1. For auto-increment or default cases, evenly divide according to the maximum and minimum values; 2. For scattered cases, can the start_key and stop_key of the table’s region be converted to the corresponding _tidb_rowid to define the range?)
Below are related tests for clustered index tables and non-clustered index tables:
-- Clustered index table
Create Table: CREATE TABLE `orders` (
`O_ORDERKEY` bigint(20) NOT NULL,
`O_CUSTKEY` bigint(20) NOT NULL,
`O_ORDERSTATUS` char(1) NOT NULL,
`O_TOTALPRICE` decimal(15,2) NOT NULL,
`O_ORDERDATE` date NOT NULL,
`O_ORDERPRIORITY` char(15) NOT NULL,
`O_CLERK` char(15) NOT NULL,
`O_SHIPPRIORITY` bigint(20) NOT NULL,
`O_COMMENT` varchar(79) NOT NULL,
PRIMARY KEY (`O_ORDERKEY`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
-- Non-clustered index table
CREATE TABLE `orders_bak` (
`O_ORDERKEY` bigint(20) NOT NULL,
`O_CUSTKEY` bigint(20) NOT NULL,
`O_ORDERSTATUS` char(1) NOT NULL,
`O_TOTALPRICE` decimal(15,2) NOT NULL,
`O_ORDERDATE` date NOT NULL,
`O_ORDERPRIORITY` char(15) NOT NULL,
`O_CLERK` char(15) NOT NULL,
`O_SHIPPRIORITY` bigint(20) NOT NULL,
`O_COMMENT` varchar(79) NOT NULL,
PRIMARY KEY (`O_ORDERKEY`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
The statement and execution plan for obtaining the checksum value of the chunk for the clustered index table:
SELECT
COUNT(*) AS CNT,
BIT_XOR(
CAST(
CRC32(
CONCAT_WS(
',',
`O_ORDERKEY`,
`O_CUSTKEY`,
`O_ORDERSTATUS`,
`O_TOTALPRICE`,
`O_ORDERDATE`,
`O_ORDERPRIORITY`,
`O_CLERK`,
`O_SHIPPRIORITY`,
`O_COMMENT`,
CONCAT(
ISNULL(`O_ORDERKEY`),
ISNULL(`O_CUSTKEY`),
ISNULL(`O_ORDERSTATUS`),
ISNULL(`O_TOTALPRICE`),
ISNULL(`O_ORDERDATE`),
ISNULL(`O_ORDERPRIORITY`),
ISNULL(`O_CLERK`),
ISNULL(`O_SHIPPRIORITY`),
ISNULL(`O_COMMENT`)
)
)
) AS UNSIGNED
)
) AS CHECKSUM
FROM
`tpch10`.`orders`
WHERE
(
(
((`O_ORDERKEY` > '8833953'))
AND ((`O_ORDERKEY` <= '9668836'))
)
AND (TRUE)
);
-- Execution plan:
+-----------------------------+-----------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-----------------------------+-----------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| HashAgg_12 | 1.00 | 1 | root | | time:516.3µs, loops:2, RU:0.475597, partial_worker:{wall_time:496.366µs, concurrency:5, task_num:1, tot_wait:1.966037ms, tot_exec:4.041µs, tot_time:1.973846ms, max:397.408µs, p95:397.408µs}, final_worker:{wall_time:0s, concurrency:5, task_num:1, tot_wait:2.083908ms, tot_exec:5.721µs, tot_time:2.091214ms, max:424.129µs, p95:424.129µs} | funcs:count(Column#12)->Column#10, funcs:bit_xor(Column#13)->Column#11 | 13.6 KB | N/A |
| └─TableReader_13 | 1.00 | 1 | root | | time:399.1µs, loops:2, cop_task: {num: 1, max: 398.2µs, proc_keys: 0, tot_proc: 1.79µs, tot_wait: 59.6µs, rpc_num: 1, rpc_time: 382.4µs, copr_cache_hit_ratio: 1.00, build_task_duration: 9.15µs, max_distsql_concurrency: 1} | data:HashAgg_6 | 275 Bytes | N/A |
| └─HashAgg_6 | 1.00 | 1 | cop[tikv] | | tikv_task:{time:209ms, loops:204}, scan_detail: {get_snapshot_time: 16.7µs, rocksdb: {block: {}}} | funcs:count(1)->Column#12, funcs:bit_xor(cast(crc32(concat_ws(",", cast(tpch10.orders.o_orderkey, var_string(20)), cast(tpch10.orders.o_custkey, var_string(20)), tpch10.orders.o_orderstatus, cast(tpch10.orders.o_totalprice, var_string(18)), cast(tpch10.orders.o_orderdate, var_string(10)), tpch10.orders.o_orderpriority, tpch10.orders.o_clerk, cast(tpch10.orders.o_shippriority, var_string(20)), tpch10.orders.o_comment, "000000000")), bigint(22) UNSIGNED BINARY))->Column#13 | N/A | N/A |
| └─TableRangeScan_11 | 203152.59 | 208723 | cop[tikv] | table:orders | tikv_task:{time:73ms, loops:204} | range:(8833953,9668836], keep order:false | N/A | N/A |
+-----------------------------+-----------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
4 rows in set (0.00 sec)
It can be seen that it goes through table range scan and the functions are pushed down to TiKV for execution, and only the aggregated few rows of records are returned to the TiDB server (there is only one region here, which is one row of records).
The statement and execution plan for obtaining the checksum value of the chunk for the non-clustered index table:
SELECT
COUNT(*) AS CNT,
BIT_XOR(
CAST(
CRC32(
CONCAT_WS(
',',
`O_ORDERKEY`,
`O_CUSTKEY`,
`O_ORDERSTATUS`,
`O_TOTALPRICE`,
`O_ORDERDATE`,
`O_ORDERPRIORITY`,
`O_CLERK`,
`O_SHIPPRIORITY`,
`O_COMMENT`,
CONCAT(
ISNULL(`O_ORDERKEY`),
ISNULL(`O_CUSTKEY`),
ISNULL(`O_ORDERSTATUS`),
ISNULL(`O_TOTALPRICE`),
ISNULL(`O_ORDERDATE`),
ISNULL(`O_ORDERPRIORITY`),
ISNULL(`O_CLERK`),
ISNULL(`O_SHIPPRIORITY`),
ISNULL(`O_COMMENT`)
)
)
) AS UNSIGNED
)
) AS CHECKSUM
FROM
`tpch10`.`orders_bak`
WHERE
(
(
((`O_ORDERKEY` > '8833953'))
AND ((`O_ORDERKEY` <= '9668836'))
)
AND (TRUE)
);
-- Execution plan:
+------------------------------+-------------+----------+-----------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+------------------------------+-------------+----------+-----------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| HashAgg_13 | 1.00 | 1 | root | | time:3.54s, loops:2, RU:35679.746372, partial_worker:{wall_time:3.537738662s, concurrency:5, task_num:1, tot_wait:17.68843555s, tot_exec:3.767µs, tot_time:17.688443423s, max:3.537694388s, p95:3.537694388s}, final_worker:{wall_time:0s, concurrency:5, task_num:1, tot_wait:17.688526743s, tot_exec:8.012µs, tot_time:17.688535975s, max:3.53771454s, p95:3.53771454s} | funcs:count(Column#13)->Column#11, funcs:bit_xor(Column#14)->Column#12 | 13.6 KB | N/A |
| └─TableReader_14 | 1.00 | 1 | root | | time:3.54s, loops:2, cop_task: {num: 2, max: 3.54s, min: 2.55s, avg: 3.04s, p95: 3.54s, max_proc_keys: 7939404, p95_proc_keys: 7939404, tot_proc: 6.07s, tot_wait: 129.2µs, rpc_num: 2, rpc_time: 6.09s, copr_cache_hit_ratio: 0.00, build_task_duration: 12.3µs, max_distsql_concurrency: 2} | data:HashAgg_6 | 436 Bytes | N/A |
| └─HashAgg_6 | 1.00 | 1 | cop[tikv] | | tikv_task:{proc max:3.53s, min:2.55s, avg: 3.04s, p80:3.53s, p95:3.53s, iters:13561, tasks:2}, scan_detail: {total_process_keys: 13884864, total_process_keys_size: 2205608093, total_keys: 13884866, get_snapshot_time: 48.3µs, rocksdb: {key_skipped_count: 13884864, block: {cache_hit_count: 46859, read_count: 26392, read_byte: 507.9 MB, read_time: 183.2ms}}} | funcs:count(1)->Column#13, funcs:bit_xor(cast(crc32(concat_ws(",", cast(tpch10.orders_bak.o_orderkey, var_string(20)), cast(tpch10.orders_bak.o_custkey, var_string(20)), tpch10.orders_bak.o_orderstatus, cast(tpch10.orders_bak.o_totalprice, var_string(18)), cast(tpch10.orders_bak.o_orderdate, var_string(10)), tpch10.orders_bak.o_orderpriority, tpch10.orders_bak.o_clerk, cast(tpch10.orders_bak.o_shippriority, var_string(20)), tpch10.orders_bak.o_comment, "000000000")), bigint(22) UNSIGNED BINARY))->Column#14 | N/A | N/A |
| └─Selection_12 | 258213.41 | 208723 | cop[tikv] | | tikv_task:{proc max:3.53s, min:2.41s, avg: 2.97s, p80:3.53s, p95:3.53s, iters:13561, tasks:2} | gt(tpch10.orders_bak.o_orderkey, 8833953), le(tpch10.orders_bak.o_orderkey, 9668836) | N/A | N/A |
| └─TableFullScan_11 | 17139728.00 | 13884864 | cop[tikv] | table:orders_bak | tikv_task:{proc max:3.42s, min:2.33s, avg: 2.87s, p80:3.42s, p95:3.42s, iters:13561, tasks:2} | keep order:false | N/A | N/A |
+------------------------------+-------------+----------+-----------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
5 rows in set (3.54 sec)
-- Force it to use the primary key index
+----------------------------------+-----------+---------+-----------+---------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------------+-----------+---------+-----------+---------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| HashAgg_13 | 1.00 | 1 | root | | time:83.9ms, loops:2, RU:753.270397, partial_worker:{wall_time:83.911166ms, concurrency:5, task_num:1, tot_wait:419.244414ms, tot_exec:14.561µs, tot_time:419.272395ms, max:83.867643ms, p95:83.867643ms}, final_worker:{wall_time:83.924262ms, concurrency:5, task_num:1, tot_wait:419.433702ms, tot_exec:26.633µs, tot_time:419.463128ms, max:83.90034ms, p95:83.90034ms} | funcs:count(Column#14)->Column#11, funcs:bit_xor(Column#15)->Column#12 | 13.6 KB | N/A |
| └─IndexLookUp_14 | 1.