Details on TiKV Rollback Records protect_rollback: Why Can It Be Unprotected?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIVK 回滚记录 protect_rollback 细节问题:为何可以不被 protect?

| username: ylldty

In the TIKV code, the Rollback interface should normally write rollback records to the write CF. This way, even if there are network issues, after rollback is completed, if prewrite is called again due to network reasons, prewrite will detect the rollback record and terminate the prewrite process.

However, there is a special scenario in the Rollback code where it does not write rollback records to the write CF. The code is as follows:

So in reality, nothing is done, not even setting overlap. How does prewrite recognize this scenario and prevent the prewrite process? Or is there some mechanism that ensures that after rollback, prewrite will not be called again due to network reasons?

Found an old comment:

Then this comment was broken by an issue:

So subsequently, the primary of pessimistic transactions was protected…
However, does optimistic transactions not have this problem? Optimistic locks do not need to check the lock, so why can they also not be protected?

| username: neilshen | Original post link

  • Using Non-protected rollback for transaction T means that it is known that transaction T will definitely be rolled back, and there is no possibility of other concurrent processes attempting to commit transaction T; this situation may occur when the transaction itself actively performs a rollback (note that this is not a rollback statement, but a rollback entered after a commit fails midway).
  • Using Protected rollback for transaction T means that it is uncertain whether transaction T might be concurrently committed, so it is necessary to ensure that the transaction is definitely rolled back. This requires ensuring that the current rollback information written cannot be discarded under any circumstances; and any concurrent commit process that sees this rollback information will fail. This situation may occur when the transaction is resolved and decided to be rolled back by another transaction.

(Note: The above reply is from my colleague)

| username: ylldty | Original post link

Optimistic transaction t1 starts the two-phase commit

  • Sent prewrite, TiKV received it, but due to network issues, it was not sent to the TiDB client.

  • The TiDB client retried prewrite, but due to network issues, it was blocked, and TiKV did not receive it.

  • Optimistic lock timeout

  • Concurrent transaction t2 called checktxnstatus and found that the transaction had timed out, started resolve lock, and performed a rollback at the non-protected level without any rollback records.

  • Due to network issues, TiKV suddenly received the previous prewrite.

The scenario I described will definitely not happen, right?

| username: TiDBer_jYQINSnf | Original post link

Borrowing this thread to ask, what does panic mean in this context? I modified the code myself and encountered a panic here, but I don’t understand how it was generated.

| username: neilshen | Original post link

First, CheckTxnStatus will definitely write a protected rollback; then if your concern is that the ResolveLock process in the screenshot does not set a protected rollback, which might cause a late prewrite to succeed, theoretically, it is indeed possible, and it could happen in both optimistic and pessimistic transactions. However, it does not affect the correctness of the transaction because CheckTxnStatus must have written a protected rollback before this (more rigorously, for regular 2PC transactions, the primary must have been written with a protected rollback in CheckTxnStatus; for async commit transactions, it is also possible that a secondary writes a protected rollback during CheckSecondaryLocks). Therefore, this transaction can never enter the commit state, and the lock written by the late prewrite will eventually be cleaned up in resolve lock.

(Note: The above reply is from my colleague)

| username: ylldty | Original post link

Thank you for clarifying.
This process involves the interaction of multiple interfaces, which indeed makes it quite difficult to understand. I might add sufficient comments to this code later on, so that future contributors can understand it more easily.

| username: ylldty | Original post link

From the CheckTxnStatus code, it seems that if the primary lock to be checked times out, the rollback record written is not of the protected type? This seems to differ from the logic you mentioned.

| username: redgame | Original post link

Optimistic locking does not check the status of the lock, so it may not need to be protected like pessimistic transactions.

| username: TiDBer_aaO4sU46 | Original post link

Both optimistic and pessimistic scenarios are possible, but they do not affect transaction correctness.