When the data service executes a transaction, if one of the statements reports an error, it will not be rolled back automatically and will be stuck until the 50s timeout

Kimberly_Switzer · July 24, 2025, 1:45am

Application environment:

The data service, http endpoint from tidb cloud

Reproduction method:

BEGIN;
insert into users (id) values (1);
insert into non_exists_table (id) values (1); // stuck here
COMMIT:

Problem:

data service can not rollback automaticly.

Resource allocation:

Attachment:

TiDB-AI-Assistant · July 24, 2025, 1:48am

1. Root Cause Analysis

The transaction remains stuck due to TiDB’s combination of:

Optimistic concurrency control: Errors are detected at commit phase rather than statement execution (source)
HTTP endpoint behavior: Maintains open transaction context until explicit COMMIT/ROLLBACK
Missing table error classification: Considered a non-fatal statement error rather than transaction-level failure
Application error handling gap: Failure to implement proper transaction cleanup after SQL errors

2. TiDB Transaction Handling Mechanism

When using TiDB Cloud’s HTTP endpoint:

Statement-level rollback: Failed INSERT rolls back only that statement’s effects
Transaction persistence: Open transaction holds resources until timeout
Distributed locking: Maintains locks across TiKV nodes during transaction lifespan
HTTP session binding: Transaction context persists across multiple requests in some configurations (source)

3. 50-Second Timeout Explanation

This duration is determined by:

tidb_idle_transaction_timeout (default 60s in TiDB Cloud)
HTTP endpoint’s max_execution_time setting
TiKV’s lock TTL (Time-To-Live) mechanism
Distributed deadlock detection interval

4. Step-by-Step Solutions

Immediate Fix:

# Sample error handling for HTTP endpoint transactions
from fastapi import HTTPException

try:
    with conn.transaction():
        conn.execute("INSERT INTO non_existent_table...")
except DatabaseError as e:
    conn.rollback()
    raise HTTPException(status_code=400, detail=str(e))

Configuration Adjustments:

Set lower idle timeout:

SET GLOBAL tidb_idle_transaction_timeout = 15000;  # 15 seconds

Enable pessimistic locking mode:
```
BEGIN PESSIMISTIC;
```

Preventative Measures:

Schema validation pre-check:

SELECT COUNT(*) FROM information_schema.tables 
WHERE table_name = 'target_table';

Implement exponential backoff retry:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def safe_transaction():
    # transaction logic

5. Best Practices

Explicit Error Handling
- Always wrap transactions in try/catch blocks
- Roll back immediately on any error (source)

Timeout Configuration

SET GLOBAL max_execution_time = 30000;  # 30s query timeout
SET GLOBAL tidb_txn_assertion_level = STRICT;  # Enable strict assertion

Monitoring

SELECT * FROM information_schema.tidb_transactions 
WHERE time_elapsed > 5000;  # Find long-running transactions

Schema Validation
- Use SHOW CREATE TABLE before critical operations
- Implement schema migration checks (source)

Connection Management

# Ensure connection cleanup
try:
    conn = pool.get_connection()
    # operations
finally:
    conn.close()  # Forces transaction cleanup

For implementation details, refer to TiDB’s official documentation on transaction error handling and HTTP endpoint configuration.

Hazel · July 24, 2025, 6:51am

Hi @Kimberly_Switzer thanks for your feedback! We’ve identified the issue and it’s now scheduled for a fix. I’ll keep you updated once there’s any progress.