Beginner's Question: What is a Coprocessor?

translator_bot · June 21, 2024, 1:09am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 小白提问：Coprocessor是个什么东西？

| username: 江湖故人

Faced with monitoring metrics and feeling confused.

Coprocessor Overview

Request duration: The total time consumed from receiving the coprocessor request to the end of processing.
Total Requests: The total ops of each type of request.
Handle duration: A histogram of the time consumed per minute to actually process coprocessor requests.
Total Request Errors: The number of coprocessor request errors per second. Normally, there should not be a large number of errors in a short period.
Total KV Cursor Operations: The total ops of various types of KV cursor operations, such as select, index, analyze_table, analyze_index, checksum_table, checksum_index, etc.
KV Cursor Operations: The number of various types of KV cursor operations per second, displayed in histogram form.
Total RocksDB Perf Statistics: RocksDB performance statistics.
Total Response Size: The size of the data responded by the coprocessor.

Coprocessor Detail

Handle duration: A histogram of the time consumed per second to actually process coprocessor requests.
95% Handle duration by store: The time taken by each TiKV instance to process coprocessor requests in 95% of the cases per second.
Wait duration: The waiting time for coprocessor requests per second, which should be less than 10s in 99.99% of the cases.
95% Wait duration by store: The waiting time for coprocessor requests on each TiKV instance in 95% of the cases per second.
Total DAG Requests: The total ops of DAG requests.
Total DAG Executors: The total ops of DAG executors.
Total Ops Details (Table Scan): The number of various events occurring per second during the scan process for select requests in the coprocessor.
Total Ops Details (Index Scan): The number of various events occurring per second during the scan process for index requests in the coprocessor.
Total Ops Details by CF (Table Scan): The number of various events occurring per second during the scan process for select requests for each CF in the coprocessor.
Total Ops Details by CF (Index Scan): The number of various events occurring per second during the scan process for index requests for each CF in the coprocessor.

translator_bot · June 21, 2024, 1:09am

| username: changpeng75 | Original post link

Compute pushdown involves pushing the computational operations originally performed on the TiDB Server down to TiKV, thereby enhancing performance through parallel computing.

translator_bot · June 21, 2024, 1:09am

| username: residentevil | Original post link

From the official documentation, this might be a unique technology of TiDB [computation pushdown], where TiKV processes in parallel and then merges the results for TiDB [for some global sorting scenarios].

translator_bot · June 21, 2024, 1:09am

| username: TIDB-Learner | Original post link

This is not considered a technology unique to TiDB.

translator_bot · June 21, 2024, 1:09am

| username: 小龙虾爱大龙虾 | Original post link

I suggest studying the PCTA and PCTP courses.

translator_bot · June 21, 2024, 1:09am

| username: FutureDB | Original post link

In simple terms, the Coprocessor is a module in TiKV that reads data and performs computations. This concept is inspired by HBase, and its implementation in TiDB is similar to the Endpoint part of the Coprocessor in HBase. It can also be compared to MySQL stored procedures.
You can check out this source code series article: TiKV 源码解析系列文章（十四）Coprocessor 概览 | PingCAP

translator_bot · June 21, 2024, 1:09am

| username: 随缘天空 | Original post link

A module for data reading and computation in TiKV:
It has the following functions:

Data Reading and Computation: When TiDB receives a query request, it generates a physical execution plan based on the query content and converts these plans into Coprocessor requests. These requests are sent to TiKV nodes, where the Coprocessor is responsible for executing data filtering and aggregation operations.
Result Caching: This involves caching the results of computations pushed down to TiKV on the TiDB instance side. This can accelerate query efficiency in specific scenarios by avoiding the repeated computation of the same data.
Performance Optimization: By pushing part of the data processing work down to the storage layer, the Coprocessor helps reduce the computational burden on the TiDB layer, lower response latency, and improve overall system performance.

translator_bot · June 21, 2024, 1:09am

| username: 小于同学 | Original post link

A computing processor on TiKV

translator_bot · June 21, 2024, 1:09am

| username: residentevil | Original post link

This document explains everything very thoroughly

translator_bot · June 21, 2024, 1:09am

| username: TiDBer_小阿飞 | Original post link

Word Definition:

coprocessor - Bing Dictionary

US['koʊˌproʊsesə] UK['kəʊˌprəʊsesə]

n. [Computing] coprocessor; auxiliary processor
Network coprocessor; coprocessing unit; auxiliary processor
Plural form: coprocessors;

Hardware Coprocessor Definition:

Coprocessor

A coprocessor is a type of chip used to offload specific processing tasks from the system microprocessor. It is a processor developed and applied to assist the central processing unit (CPU) in completing tasks that it cannot perform or performs inefficiently.

Chinese Name: 协处理器
Foreign Name: coprocessor
Type: A type of chip
Purpose: Used to offload the system microprocessor
Function: Assists the central processing unit in completing tasks that it cannot perform or performs inefficiently

Coprocessor Analysis in TiDB:
TiKV Coprocessor currently handles three main types of read requests:
DAG: Executes physical operators to compute intermediate results for SQL, thereby reducing TiDB’s computational and network overhead. This is the task executed by the Coprocessor in the vast majority of scenarios.
Analyze: Analyzes table data, collects and samples table data information, which is then persisted and used by TiDB’s optimizer.
CheckSum: Verifies table data, used for consistency checks after data import.

translator_bot · June 21, 2024, 1:09am

| username: 江湖故人 | Original post link

Based on everyone’s replies, I have slightly organized the information.

The Coprocessor is a module in TiKV responsible for reading and computing, mainly used to accelerate query requests and is unrelated to write requests. It can push down some operations from TiDB to KV nodes, avoiding the need to synchronize all data to TiDB nodes for computation.
Currently, TiKV Coprocessor handles three main types of read requests:

DAG (Directed Acyclic Graph): Executes physical operators to compute intermediate results for SQL, thereby reducing TiDB’s computation and network overhead. This is the task executed by the Coprocessor in most scenarios.
Analyze: Analyzes table data, collects and samples table data information, which is then used by TiDB’s optimizer after being persisted.
CheckSum: Verifies table data for consistency checks after data import.

Grafana > TiKV-Details > Coprocessor Overview > Total KV Cursor Operations:
This includes the total number of various types of KV cursor operations, such as select, index, analyze_table, analyze_index, checksum_table, checksum_index, etc.

Execution process of read requests:

TiDB receives the query statement, analyzes it, calculates the physical execution plan, and organizes it into a Coprocessor request for TiKV.
TiDB distributes the Coprocessor request to all relevant TiKVs based on the data distribution.
TiKV, upon receiving the Coprocessor request, filters and aggregates the data according to the request operators, then returns the results to TiDB.
TiDB, after receiving all the returned data, performs a secondary aggregation and computes the final result, which is then returned to the client.