[TiDB Usage Environment] Production Environment
[TiDB Version] v5.2.1
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
I would like to ask how TiDB can analyze whether certain DML statements have been executed and at what specific time, similar to the mysqlbinlog in MySQL that analyzes binlog to output executed SQL statements.
TiDB has a feature similar to MySQL’s mysqlbinlog called TiCDC (TiDB Change Data Capture). TiCDC captures and replicates DML (Data Manipulation Language) changes made to the TiDB cluster in real-time to downstream databases or message queues. TiCDC works by parsing TiDB’s binlog and extracting DML statements from it.
TiCDC provides a Command Line Interface (CLI) that can be used to configure and manage replication tasks. By using the cdc cli capture command, you can capture binlogs at a specific point in time and output the DML statements executed from that point onward.
Here is an example command you can use to capture binary logs and output DML statements:
I know about ticdc, but it is just an incremental synchronization tool that can only analyze changelogs without landing. The main requirement is for TiDB to retain the change log of TiKV as a file; otherwise, if it doesn’t land, it cannot be analyzed after the GC time has passed.
The tidb_general_log records all statements including SELECT, and even enabling it for a short period can generate a large volume of logs, so it is not recommended to keep it enabled for long periods. If you want to analyze binlogs similar to mysqlbinlog in MySQL, you should first use the Drainer in TiDB Binlog to output the binlog in protobuf format to a file. Then, use Reparo to parse the binlog in the file, and set the dest-type parameter to print to print out the contents of the binlog.
TiKV Raft logs are used to store changes made to the data in TiKV. Raft logs are replicated logs stored on each node in the cluster. Raft logs are used to maintain consistency between nodes, ensuring that all nodes have the same data. Raft logs are used by TiCDC to capture changes and send them to the TiDB Binlog. Once the changes are captured and written to the TiDB Binlog, TiDB can analyze the TiDB Binlog to determine which DML statements were executed.
Isn’t TiCDC an upgraded version of binlog? This log still doesn’t land into a file, and there’s no analysis method like mysqlbinlog, so it’s impossible to analyze it afterward. For example, whether someone modified a certain table within a past time period, which specific fields were modified, and what values were changed from and to.
You can enable the binlog and output the binlog to a file, which allows you to analyze the binlog. For specific parameters, you can refer to the official documentation: TiDB Binlog 简介 | PingCAP 文档中心
In TiDB, you can use TiDB Binlog to analyze executed DML statements and their execution times. TiDB Binlog captures changes made to the database and records them in a structured format.
To analyze executed DML statements and their execution times using TiDB Binlog, follow these steps:
Enable TiDB Binlog: Ensure that your TiDB cluster has TiDB Binlog enabled. This can be done by setting the enable-binlog configuration parameter to true in the TiDB configuration file (tidb.toml).
Configure Binlog Settings: Set the relevant configuration options for TiDB Binlog, such as binlog-format and binlog-socket. These settings determine the format and destination of the binlog data.
Start TiDB Binlog: Restart the TiDB server after making the necessary configuration changes to enable TiDB Binlog.
Use TiDB Binlog Tool: TiDB provides the tidb-binlog tool to process and analyze binlog data. You can use this tool to extract DML statements and their execution times from the binlog files.
This command extracts binlog data from binlog.000001 and outputs the executed DML statements to the file executed_dml_statements.sql.
Note that you need to specify the correct binlog file name or use a wildcard (*) to process multiple binlog files.
Analyze Executed DML Statements: After extracting the DML statements into a file, you can analyze them using various methods, such as directly viewing the file or loading it into a database or other tools for further analysis.
It is important to note that TiDB Binlog is primarily designed for replication purposes rather than general auditing or analysis. If you need detailed auditing or analysis of executed DML statements, you may consider using specialized tools or implementing custom logging or auditing mechanisms at your application layer.
Additionally, be aware of the performance impact of enabling TiDB Binlog, as it introduces additional overhead to the cluster. Properly configure and monitor binlog settings to ensure optimal performance for your specific use case.