Will TiDB Distributed Transactions Roll Back Completely if They Fail in the Prepare Phase?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB分布式事务如何在prepare阶段失败会整体回滚吗

| username: alfred

To improve efficiency, please provide the following information. A clear problem description can help resolve the issue faster:

[Overview] Scenario + Problem Overview

[Application Framework and Development Adaptation Business Logic]

[Background] Actions Taken

[Phenomenon] Business and Database Phenomenon

[Problem] Current Issue Encountered

[Business Impact]

[TiDB Version]

[Attachments] Relevant Logs and Monitoring (https://metricstool.pingcap.com/)


For questions related to performance optimization and troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

| username: ddhe9527 | Original post link

TiDB uses the Percolator transaction model, and the first phase is called the prewrite phase, not the prepare phase. If the prewrite phase fails, the entire transaction will be rolled back. The main tasks of the prewrite phase are as follows, with the third step involving version checks and lock conflict checks. Failure to pass these checks will result in the transaction being rolled back.

  1. The COMMIT statement initiates a Pre-write request to the Percolator Worker.

  2. Among all the write operations included in the transaction, one is selected as the primary operation (in TiDB, the first row of the transaction is taken as the primary), and the rest are secondary operations. The primary operation will serve as the mutex point of the entire transaction, marking the transaction’s status.

  3. First, prewrite the primary operation, and upon success, prewrite the secondary operations. During the prewrite process, the following checks must be performed for each write operation, and only after passing these checks can the next step proceed:
    3.1. Version check: Check if the version number of the write column corresponding to the row being written is later than start_ts. If it is, it indicates a version conflict (another transaction has already committed), and the entire transaction is immediately canceled.
    3.2. Lock conflict check: Check if the lock column corresponding to the row being written has a lock. If it does, it indicates that another transaction is currently writing, and the entire transaction is immediately canceled.

  4. After passing the checks, write the data to the data column (persisting it) using start_ts as the version number, but do not write to the write column. This means that the written data is still not visible at this point.

  5. Lock the operation row, i.e., update the lock information in the lock column: the lock of the primary operation row is directly marked as primary, while the lock of the secondary operation rows is marked with the row key and field name of the primary operation row.

| username: alfred | Original post link

Yes, it is the prewrite phase. If the prewrite phase is successful and it enters the second phase, the commit phase, the distributed transaction will definitely succeed, right?

| username: ddhe9527 | Original post link

Yes, the transaction status is already determined at the end of prewrite, which is why there is an optimization like async commit.