In the TIKV code, the Rollback interface should normally write rollback records to the write CF. This way, even if there are network issues, after rollback is completed, if prewrite is called again due to network reasons, prewrite will detect the rollback record and terminate the prewrite process.
However, there is a special scenario in the Rollback code where it does not write rollback records to the write CF. The code is as follows:
So in reality, nothing is done, not even setting overlap. How does prewrite recognize this scenario and prevent the prewrite process? Or is there some mechanism that ensures that after rollback, prewrite will not be called again due to network reasons?
So subsequently, the primary of pessimistic transactions was protected…
However, does optimistic transactions not have this problem? Optimistic locks do not need to check the lock, so why can they also not be protected?
Using Non-protected rollback for transaction T means that it is known that transaction T will definitely be rolled back, and there is no possibility of other concurrent processes attempting to commit transaction T; this situation may occur when the transaction itself actively performs a rollback (note that this is not a rollback statement, but a rollback entered after a commit fails midway).
Using Protected rollback for transaction T means that it is uncertain whether transaction T might be concurrently committed, so it is necessary to ensure that the transaction is definitely rolled back. This requires ensuring that the current rollback information written cannot be discarded under any circumstances; and any concurrent commit process that sees this rollback information will fail. This situation may occur when the transaction is resolved and decided to be rolled back by another transaction.
Optimistic transaction t1 starts the two-phase commit
Sent prewrite, TiKV received it, but due to network issues, it was not sent to the TiDB client.
The TiDB client retried prewrite, but due to network issues, it was blocked, and TiKV did not receive it.
Optimistic lock timeout
Concurrent transaction t2 called checktxnstatus and found that the transaction had timed out, started resolve lock, and performed a rollback at the non-protected level without any rollback records.
Due to network issues, TiKV suddenly received the previous prewrite.
The scenario I described will definitely not happen, right?
Borrowing this thread to ask, what does panic mean in this context? I modified the code myself and encountered a panic here, but I don’t understand how it was generated.
First, CheckTxnStatus will definitely write a protected rollback; then if your concern is that the ResolveLock process in the screenshot does not set a protected rollback, which might cause a late prewrite to succeed, theoretically, it is indeed possible, and it could happen in both optimistic and pessimistic transactions. However, it does not affect the correctness of the transaction because CheckTxnStatus must have written a protected rollback before this (more rigorously, for regular 2PC transactions, the primary must have been written with a protected rollback in CheckTxnStatus; for async commit transactions, it is also possible that a secondary writes a protected rollback during CheckSecondaryLocks). Therefore, this transaction can never enter the commit state, and the lock written by the late prewrite will eventually be cleaned up in resolve lock.
Thank you for clarifying.
This process involves the interaction of multiple interfaces, which indeed makes it quite difficult to understand. I might add sufficient comments to this code later on, so that future contributors can understand it more easily.
From the CheckTxnStatus code, it seems that if the primary lock to be checked times out, the rollback record written is not of the protected type? This seems to differ from the logic you mentioned.