Can a 5 million data update operation be executed directly or does it need to be done in batches?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 500万数据更新操作可以直接执行吗还是需要分批

| username: 小于同学

Can the update operation for 5 million records be executed directly, or does it need to be done in batches?

| username: 迪迦奥特曼 | Original post link

Due to the size limitations of the parameters txn-entry-size-limit and txn-total-size-limit, it is best to split the actual operations.

| username: 濱崎悟空 | Original post link

This data volume tends to be processed in batches.

| username: zhaokede | Original post link

Batch processing, avoid large transactions.

| username: zhanggame1 | Original post link

Check if the memory is sufficient; overall execution should be fine.

| username: Jellybean | Original post link

The overall execution may be limited by the size of the transaction. Even if it happens to be within the limit and can be executed, it is expected that the execution time will be relatively long.

It is recommended to process in batches, which is also the best practice recommended by the official documentation.

| username: FutureDB | Original post link

Updating 5 million records at once may lead to two issues:

  1. Exceeding the transaction size, resulting in failure to execute successfully;
  2. Excessive memory usage, causing TiDB nodes to OOM (Out of Memory);
    It is recommended to split the batch.
| username: 数据库真NB | Original post link

If no one else is using the cluster and there is enough time to wait for execution, completing the task in one go poses no technical issues for the database.

| username: Kongdom | Original post link

The suggestion is to execute in batches, unless the hardware resources are particularly strong.