Occasional Errors in Frontend Data Updates Due to Write Conflicts

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 前端更新数据偶尔的会报错写成冲突

| username: 舞动梦灵

There is a situation where the front-end payment is successful, and an SQL update statement is executed to update the status. Every day, there are a dozen SQL update statements indicating write conflicts. The update SQL is updating the primary key, and the internal update operation of TiDB is essentially inserting a new piece of data. The developer also investigated that when this SQL was executed, there were no other operations on the primary key ID being updated. Where should this issue be investigated?

SQL:

update table_pay
    <set >
      <if test="merchantid != null" >
        MERCHANTID = #{merchantid,jdbcType=VARCHAR},
      </if>
      <if test="pmParkId != null" >
        PM_PARK_ID = #{pmParkId,jdbcType=VARCHAR},
········十几个if判断
      <if test="createdTime != null" >
        CREATED_TIME = #{createdTime,jdbcType=TIMESTAMP},
      </if>
    </set>
    where ID = #{id,jdbcType=CHAR}

Error message:

Runtime exception [Exception], exception information: Could not commit JDBC transaction; nested exception is java.sql.SQLException: Write conflict, txnStartTS=446916360684437510, conflictStartTS=446916359989755947, conflictCommitTS=446916360684437536, key={tableID=134, handle=916307} primary={tableID=134, handle=916307} [try again later]
| username: tidb菜鸟一只 | Original post link

It should be concurrent updates to the same row, right?

| username: 舞动梦灵 | Original post link

However, the developer said that at that time, there was only one update operation on this primary key and no other operations.

| username: Jellybean | Original post link

First of all, this error must be caused by a write transaction conflict, which occurs when multiple transactions attempt to modify the same row of data, leading to a write conflict error. This situation is almost always caused by repeated writes on the business side.

You can check if the cluster is using optimistic locking and whether the business is also performing duplicate inserts during internal transaction retries.
Additionally, confirm whether the SQL updating the primary key is included in a larger transaction.

| username: wangccsy | Original post link

It is indeed possible that it is a primary key conflict or something similar.

| username: 舞动梦灵 | Original post link

They have confirmed that this is a write conflict occurring during the update operation. I will check if it is a larger transaction. Isn’t the database default to pessimistic locking? This is version 4.0.9. I saw on the official website that pessimistic locking is the default starting from version 3.x.

The business logic is that when a car exits the parking lot, it updates the payment status. After the payment is successful, it updates the payment status, and then the parking gate lifts. However, there was a write conflict when updating the payment status. As a result, the payment was successful, but because the payment status was not updated, the gate did not lift.

| username: 随缘天空 | Original post link

It would be best to add a locking mechanism at the program level. The database is probably using optimistic locking.

| username: andone | Original post link

Check whether the database is using pessimistic locking or optimistic locking.

| username: 哈喽沃德 | Original post link

Concurrency programming has nothing to do with databases.

| username: Jellybean | Original post link

  1. Regarding the use of pessimistic locks:
    Since v3.0.8, newly created TiDB clusters use the pessimistic transaction model by default. However, if a cluster created with version v3.0.7 or earlier is upgraded to version >= v3.0.8, the default transaction model will not change, meaning only newly created clusters will use the pessimistic transaction model by default.

  2. It is recommended to provide desensitized, complete business transaction SQL, or program processing logic and concurrency situation.

  3. When a lock conflict occurs, you need to first confirm whether the cluster’s transactions are optimistic or pessimistic. Check the cluster health and system table information:

  1. Analyze and observe the not_expired/resolve monitoring items in the Lock Resolve OPS panel and the txnLockFast monitoring item in the KV Backoff OPS panel under KV Errors through TiDB Grafana monitoring. If there is a significant upward trend, it may indicate a large number of transaction conflicts in the current environment. Among them, not_expired means the corresponding lock has not expired, resolve means the operation of trying to clear the lock, and txnLockFast represents a read-write conflict.

  2. The system tables TIDB_TRX and CLUSTER_TIDB_TRX provide information on all running transactions on the current TiDB node or the entire cluster, including whether the transaction is in a waiting lock state, the waiting lock time, and the Digest of the statements executed by the transaction.

| username: 舞动梦灵 | Original post link

  1. It was not an upgrade, but a direct installation of version 4.0.9. The default is pessimistic locking.
    image
  2. The complete update SQL was sent to me by the developer. The complete transaction has not been confirmed yet. The developer worked overtime last night and will be available in the afternoon.
update ipsp_union_pay
    <set >
      <if test="merchantid != null" >
        MERCHANTID = #{merchantid,jdbcType=VARCHAR},
      </if>
      <if test="pmParkId != null" >
        PM_PARK_ID = #{pmParkId,jdbcType=VARCHAR},
      </if>
      <if test="carNumber != null" >
        CAR_NUMBER = #{carNumber,jdbcType=VARCHAR},
      </if>
      <if test="amount != null" >
        AMOUNT = #{amount,jdbcType=INTEGER},
      </if>
      <if test="payOrderId != null" >
        PAY_ORDER_ID = #{payOrderId,jdbcType=VARCHAR},
      </if>
      <if test="parkOrderId != null" >
        PARK_ORDER_ID = #{parkOrderId,jdbcType=VARCHAR},
      </if>
      <if test="unionOrderId != null" >
        UNION_ORDER_ID = #{unionOrderId,jdbcType=VARCHAR},
      </if>
      <if test="sendTime != null" >
        SEND_TIME = #{sendTime,jdbcType=TIMESTAMP},
      </if>
      <if test="recvTime != null" >
        RECV_TIME = #{recvTime,jdbcType=TIMESTAMP},
      </if>
      <if test="payState != null" >
        PAY_STATE = #{payState,jdbcType=CHAR},
      </if>
      <if test="failDesc != null" >
        FAIL_DESC = #{failDesc,jdbcType=VARCHAR},
      </if>
      <if test="successTime != null" >
        SUCCESS_TIME = #{successTime,jdbcType=TIMESTAMP},
      </if>
      <if test="reserved != null" >
        RESERVED = #{reserved,jdbcType=VARCHAR},
      </if>
      <if test="queryTimes != null" >
        QUERY_TIMES = #{queryTimes,jdbcType=INTEGER},
      </if>
      <if test="lastQueryTime != null" >
        LAST_QUERY_TIME = #{lastQueryTime,jdbcType=TIMESTAMP},
      </if>
      <if test="callBackState != null" >
        CALL_BACK_STATE = #{callBackState,jdbcType=CHAR},
      </if>
      <if test="callBackCount != null" >
        CALL_BACK_COUNT = #{callBackCount,jdbcType=INTEGER},
      </if>
      <if test="callBackLastTime != null" >
        CALL_BACK_LAST_TIME = #{callBackLastTime,jdbcType=TIMESTAMP},
      </if>
      <if test="configVersion != null" >
        CONFIG_VERSION = #{configVersion,jdbcType=VARCHAR},
      </if>
      <if test="createdTime != null" >
        CREATED_TIME = #{createdTime,jdbcType=TIMESTAMP},
      </if>
    </set>
    where ID = #{id,jdbcType=CHAR}

3.1 Monitoring shows a noticeable trend of fluctuations at the corresponding time, but the data doesn’t seem to be very large.

3.2 Are the two tables you mentioned only available in the major version?

| username: 像风一样的男子 | Original post link

When it comes to development, you can’t trust even a single punctuation mark. Do the developers not acknowledge such obvious lock conflicts?

| username: 舞动梦灵 | Original post link

Yes, it’s definitely their problem. I checked another document in the community, and there are three possibilities. I still think their logic is flawed:

  1. Mixed use of optimistic and pessimistic locks
  2. Check if there is any involvement of the database proxy layer, whether there is link reuse or link misappropriation, and check if there are any error logs at the corresponding time points
  3. Bug in the database itself
| username: Jellybean | Original post link

Focus your efforts on investigating the first two issues.

| username: dba远航 | Original post link

It is estimated that unreasonable concurrency was used.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.