Determining Large Transactions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 大事务的判断

| username: Gavin_shi

How does TiDB determine large transactions?

| username: 啦啦啦啦啦 | Original post link

Mainly in terms of memory. For large transactions, it is necessary to plan memory in advance to ensure system stability and performance.

| username: zhanggame1 | Original post link

Do you have any specific requirements?

| username: Gavin_shi | Original post link

We are selecting a database, mainly for both AP and TP scenarios, and there will be scenarios with INSERT INTO … SELECT…, so we looked at TiDB’s large transaction documentation but couldn’t find how TiDB determines large transactions. Or is it that large transactions are determined by the developers and then adjusted at the planning and parameter levels?

| username: zhanggame1 | Original post link

INSERT INTO … SELECT… Mainly, you need to evaluate the memory usage. If the data volume is too large, it will cause OOM, and you need to adjust the memory parameters. By default, a single SQL in TiDB can use up to 1GB of memory.

| username: kelvin | Original post link

It is indeed necessary to plan well, otherwise frequent OOM (Out of Memory) issues can be very annoying.

| username: Kongdom | Original post link

:joy: It should be known as a large transaction only after execution, right? If it is not executed, it can only be estimated based on the data volume and execution plan.

| username: andone | Original post link

Break large transactions into smaller transactions.

| username: Jellybean | Original post link

TiDB has limitations on the size of a single transaction, and this limitation is at the KV level.

Reflecting on the SQL level, simply put, a row of data will be mapped to a KV entry, and each additional index will also increase a KV entry. Therefore, this limitation reflects on the SQL level as:

  • The maximum single row record size is 120MB (TiDB v5.0 and higher versions can adjust this through the tidb-server configuration item performance.txn-entry-size-limit, versions below TiDB v5.0 support a single row size of 6MB).
  • The maximum single transaction size supported is 10GB (TiDB v4.0 and higher versions can adjust this through the tidb-server configuration item performance.txn-total-size-limit, versions below TiDB v4.0 support a maximum single transaction size of 100MB).

Additionally, note that whether it is a size limit or a row count limit, you also need to consider the overhead of TiDB encoding and additional transaction keys during transaction execution. When using it, to achieve optimal performance, it is recommended to write 100 to 500 rows per transaction.

| username: h5n1 | Original post link

Insert into select using non-transactional DML, TiDB will execute in batches according to your splits!

| username: swino | Original post link

In TiDB, if a transaction accesses a large range of data and lasts for a long time, it is considered a large transaction. Large transactions can significantly impact system performance and throughput because they need to lock a large amount of data, which may lead to lock waits and deadlocks.

Therefore, TiDB provides a mechanism to identify large transactions. In TiDB, large transactions are identified in the following two ways:

  1. Logging Transaction Operations

When TiDB receives a transaction, it logs all operations within the transaction and marks it as a single transaction in the log content. If the number of SQL statements executed in this transaction exceeds the system’s set threshold or the duration exceeds the threshold, it is considered a large transaction. TiDB’s default thresholds are more than 300 SQL statements or a transaction execution time exceeding 5 seconds.

  1. Checking Lock Wait Time

If a transaction is waiting for a lock for a time exceeding the system’s set threshold, it is considered a large transaction. TiDB’s default threshold is a lock wait time exceeding 50 seconds.

When TiDB identifies a transaction as a large transaction, it marks the transaction as such and may warn or refuse to execute the transaction. This mechanism can be adjusted by modifying TiDB’s configuration file to suit different business needs.

In summary, TiDB identifies large transactions by logging operation details and checking lock wait times to avoid significant impacts on system performance and throughput caused by large transactions.

| username: 春风十里 | Original post link

The maximum supported single transaction size is 10GB (for TiDB v4.0 and later versions, it can be adjusted through the tidb-server configuration item performance.txn-total-size-limit; for versions earlier than TiDB v4.0, the maximum supported single transaction size is 100MB).

Additionally, note that regardless of size or row count limits, you also need to consider the overhead of TiDB encoding and additional transaction keys during transaction execution. For optimal performance, it is recommended to write 100 to 500 rows per transaction.

The maximum supported single transaction size is 10GB.
Does this transaction size refer to the size of the updated data?
If so, isn’t this support for large transactions not very friendly? For a large table, a single transaction of 10GB would require splitting an update into many transactions. However, TiDB currently does not support stored procedures, so wouldn’t making changes to a large table be very troublesome?

| username: zhanggame1 | Original post link

Increase the memory limit parameter, as long as it doesn’t OOM, you don’t need to worry about it.

| username: 随缘天空 | Original post link

I think it should be considered a large transaction when the data involved in a transaction exceeds a certain threshold. For example, the default threshold for a single SQL query in TiDB, tidb_mem_quota_query, is 1GB. When your SQL execution exceeds this memory usage, it may result in an error and be considered a large transaction.

| username: 路在何chu | Original post link

However, sometimes it doesn’t work. I set it to 4G, but a single SQL query consumes over ten gigabytes of memory without any errors.

| username: forever | Original post link

I see you’ve been promoting this feature a lot. How’s it working for you? :grin:

| username: 随缘天空 | Original post link

It shouldn’t be. We encountered an error when executing a large SQL before, so it should be that your SQL hasn’t reached the maximum set value.

| username: zhanggame1 | Original post link

The default memory usage of 1GB for TiDB is too small. Adjusting it to 10GB can definitely handle various scenarios.

| username: TiDBer_gxUpi9Ct | Original post link

What are the requirements?

| username: dba远航 | Original post link

One is runtime, and the other is memory usage. It is recommended to split large transactions to avoid OOM (Out of Memory).