Efficient Methods for Bulk Data Updates

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 求大批量数据更新的高效方法

| username: Running

More than 80 million pieces of data need to be adjusted according to the business scenario to update the user’s affiliated school. It has been a day and it is still not finished. Do you have any efficient methods?

| username: db_user | Original post link

First, check if this join is using the index properly. If it is using the index, it shouldn’t be this slow.

| username: xfworld | Original post link

Step-by-step process:

  1. Find the primary keys of the records you want to update.
  2. Batch update these primary keys.

You need to consider the issue of handling large transactions. It is best to do it in batches.

If you use the join method, it is very likely that the query will take a long time.
You can use explain sql to check the execution plan of this query…

| username: Running | Original post link

It looks like it’s using the index.

| username: Running | Original post link

Changed the implementation, the efficiency is slightly higher now. Will the official team consider this optimization in the next version?

| username: xfworld | Original post link

Which version are you using?

| username: yulei7633 | Original post link

Try a different approach and split the SQL query.

| username: WalterWj | Original post link

Try this: 分页查询 | PingCAP 文档中心

| username: Running | Original post link

It only implements another batching method, but it still cannot automate the batch process.