Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: lightning并行导入结束的标识是什么
Lightning parallel import does not have a unified management end to coordinate each lightning. So how does TiDB know that an import has been completed? If a lightning task is initiated after other lightning tasks have ended, can it still be successfully imported? Regular import sets TiKV to import mode and reverts it to normal mode after verification. How does this state change during parallel import? Parallel import requires the target table to be empty. If a new lightning operation is initiated after one lightning import is completed, wouldn’t the table no longer be empty? Doesn’t this contradict the requirement for an empty table?
The question raised is very good, you can test it, we also want to know the result. However, during parallel import, it should start simultaneously. If one finishes and the others have not finished, the overall status is considered incomplete. There will be a switch to control this.
Parallel import means initiating tasks simultaneously. What you mentioned about starting the lightning task after other lightning tasks have ended does not count as parallel.
Check the log information, and there are the following prompts:
[INFO] [restore.go:442] [“the whole procedure completed”] [takeTime=108.167654ms]
Indicates that the entire import process has been completed.
[INFO] [main.go:106] [“tidb lightning exit”] [finished=true]
Indicates that TiDB Lightning has exited.
In actual operations, there is definitely a sequence. I also saw in the official course that the previous lightning had already ended, and the next one was initiated as usual, which is why I have this question.
This information should be the log seen by a single lightning process, but there is no clear marker for the entire parallel import.
I tried hard to understand, but it still feels like a log of individual imports. The official documentation states, “In terms of technical implementation, TiDB Lightning records metadata of each instance and each imported table in the target TiDB, coordinating the Row ID allocation range of different instances, the recording of global Checksum, and the configuration changes and recovery of TiKV and PD.” However, I still haven’t seen more detailed explanations.
Are the target tables for your parallel import tasks all the same?
If the target tables are different, then each manages its own. I don’t quite understand the scenario where the target tables are the same.
If the target table is the same, why run multiple instances of Lightning together? I don’t understand what scenario this is. Doing this will definitely fail in local mode, and in TiDB mode, it will be converted to replace, which won’t improve efficiency much either.
Corresponding to the scenario in Example 1
TiDB Lightning exited successfully.
The documentation is quite clear. In parallel mode, Lightning logically separates multiple import tasks through metadata management. From this perspective, the target end of each Lightning instance is an “empty table.”