How to Analyze Whether DML Statements Have Been Executed in TiDB Similar to mysqlbinlog in MySQL for Analyzing Binlog and Outputting Executed SQL Statements

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 如何分析有没有执行过DML语句 类似mysql 里面的mysqlbinlog 分析binlog 输出执行过的sql语句

| username: TiDBer_ZsnVPQB4

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.2.1
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
I would like to ask how TiDB can analyze whether certain DML statements have been executed and at what specific time, similar to the mysqlbinlog in MySQL that analyzes binlog to output executed SQL statements.

| username: 裤衩儿飞上天 | Original post link

  1. The commercial version can enable the audit feature.
  2. The community version can enable the tidb_general_log parameter and then check the logs in the corresponding tidb.log.
  3. Enabling tidb_general_log in a production environment may have a significant impact; you can use proxysql’s audit feature instead.
  4. Continuously capture and analyze network packets.
| username: ljluestc | Original post link

TiDB has a feature similar to MySQL’s mysqlbinlog called TiCDC (TiDB Change Data Capture). TiCDC captures and replicates DML (Data Manipulation Language) changes made to the TiDB cluster in real-time to downstream databases or message queues. TiCDC works by parsing TiDB’s binlog and extracting DML statements from it.

TiCDC provides a Command Line Interface (CLI) that can be used to configure and manage replication tasks. By using the cdc cli capture command, you can capture binlogs at a specific point in time and output the DML statements executed from that point onward.

Here is an example command you can use to capture binary logs and output DML statements:

cdc cli capture --pd-endpoints=pd1:2379,pd2:2379,pd3:2379 \
    --start-ts=425091232384233219 \
    --target-dir=/path/to/output_dir \
    --log-file=/path/to/log_file
| username: TiDBer_ZsnVPQB4 | Original post link

I know about ticdc, but it is just an incremental synchronization tool that can only analyze changelogs without landing. The main requirement is for TiDB to retain the change log of TiKV as a file; otherwise, if it doesn’t land, it cannot be analyzed after the GC time has passed.

| username: Running | Original post link

Enabling tidb_general_log can record logs, and you can set up a separate tidbserver to handle this.

| username: 啦啦啦啦啦 | Original post link

The tidb_general_log records all statements including SELECT, and even enabling it for a short period can generate a large volume of logs, so it is not recommended to keep it enabled for long periods. If you want to analyze binlogs similar to mysqlbinlog in MySQL, you should first use the Drainer in TiDB Binlog to output the binlog in protobuf format to a file. Then, use Reparo to parse the binlog in the file, and set the dest-type parameter to print to print out the contents of the binlog.

| username: ljluestc | Original post link

TiKV Raft logs are used to store changes made to the data in TiKV. Raft logs are replicated logs stored on each node in the cluster. Raft logs are used to maintain consistency between nodes, ensuring that all nodes have the same data. Raft logs are used by TiCDC to capture changes and send them to the TiDB Binlog. Once the changes are captured and written to the TiDB Binlog, TiDB can analyze the TiDB Binlog to determine which DML statements were executed.

| username: TiDBer_ZsnVPQB4 | Original post link

Isn’t TiCDC an upgraded version of binlog? This log still doesn’t land into a file, and there’s no analysis method like mysqlbinlog, so it’s impossible to analyze it afterward. For example, whether someone modified a certain table within a past time period, which specific fields were modified, and what values were changed from and to.

| username: weixiaobing | Original post link

You can enable the binlog and output the binlog to a file, which allows you to analyze the binlog. For specific parameters, you can refer to the official documentation: TiDB Binlog 简介 | PingCAP 文档中心

| username: ljluestc | Original post link

In TiDB, you can use TiDB Binlog to analyze executed DML statements and their execution times. TiDB Binlog captures changes made to the database and records them in a structured format.

To analyze executed DML statements and their execution times using TiDB Binlog, follow these steps:

  1. Enable TiDB Binlog: Ensure that your TiDB cluster has TiDB Binlog enabled. This can be done by setting the enable-binlog configuration parameter to true in the TiDB configuration file (tidb.toml).

  2. Configure Binlog Settings: Set the relevant configuration options for TiDB Binlog, such as binlog-format and binlog-socket. These settings determine the format and destination of the binlog data.

  3. Start TiDB Binlog: Restart the TiDB server after making the necessary configuration changes to enable TiDB Binlog.

  4. Use TiDB Binlog Tool: TiDB provides the tidb-binlog tool to process and analyze binlog data. You can use this tool to extract DML statements and their execution times from the binlog files.

    Example: tidb-binlog -s binlog.000001 > executed_dml_statements.sql

    This command extracts binlog data from binlog.000001 and outputs the executed DML statements to the file executed_dml_statements.sql.

    Note that you need to specify the correct binlog file name or use a wildcard (*) to process multiple binlog files.

  5. Analyze Executed DML Statements: After extracting the DML statements into a file, you can analyze them using various methods, such as directly viewing the file or loading it into a database or other tools for further analysis.

It is important to note that TiDB Binlog is primarily designed for replication purposes rather than general auditing or analysis. If you need detailed auditing or analysis of executed DML statements, you may consider using specialized tools or implementing custom logging or auditing mechanisms at your application layer.

Additionally, be aware of the performance impact of enabling TiDB Binlog, as it introduces additional overhead to the cluster. Properly configure and monitor binlog settings to ensure optimal performance for your specific use case.

| username: redgame | Original post link

So it really exists, I was looking for it too.

| username: TiDBer_6ID1U8NP | Original post link

Is there a download link for tidb-binlog? Thank you.