What is the indicator for the completion of parallel import in Lightning?

translator_bot · June 21, 2024, 11:52am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: lightning并行导入结束的标识是什么

| username: sjtuxpy

Lightning parallel import does not have a unified management end to coordinate each lightning. So how does TiDB know that an import has been completed? If a lightning task is initiated after other lightning tasks have ended, can it still be successfully imported? Regular import sets TiKV to import mode and reverts it to normal mode after verification. How does this state change during parallel import? Parallel import requires the target table to be empty. If a new lightning operation is initiated after one lightning import is completed, wouldn’t the table no longer be empty? Doesn’t this contradict the requirement for an empty table?

translator_bot · June 21, 2024, 11:52am

| username: dba远航 | Original post link

The question raised is very good, you can test it, we also want to know the result. However, during parallel import, it should start simultaneously. If one finishes and the others have not finished, the overall status is considered incomplete. There will be a switch to control this.

translator_bot · June 21, 2024, 11:52am

| username: Jolyne | Original post link

Parallel import means initiating tasks simultaneously. What you mentioned about starting the lightning task after other lightning tasks have ended does not count as parallel.

translator_bot · June 21, 2024, 11:52am

| username: 大飞哥online | Original post link

Check the log information, and there are the following prompts:
[INFO] [restore.go:442] [“the whole procedure completed”] [takeTime=108.167654ms]
Indicates that the entire import process has been completed.
[INFO] [main.go:106] [“tidb lightning exit”] [finished=true]
Indicates that TiDB Lightning has exited.

translator_bot · June 21, 2024, 11:52am

| username: hey-hoho | Original post link

Refer to this post:

translator_bot · June 21, 2024, 11:52am

| username: sjtuxpy | Original post link

In actual operations, there is definitely a sequence. I also saw in the official course that the previous lightning had already ended, and the next one was initiated as usual, which is why I have this question.

translator_bot · June 21, 2024, 11:52am

| username: sjtuxpy | Original post link

This information should be the log seen by a single lightning process, but there is no clear marker for the entire parallel import.

translator_bot · June 21, 2024, 11:52am

| username: sjtuxpy | Original post link

I tried hard to understand, but it still feels like a log of individual imports. The official documentation states, “In terms of technical implementation, TiDB Lightning records metadata of each instance and each imported table in the target TiDB, coordinating the Row ID allocation range of different instances, the recording of global Checksum, and the configuration changes and recovery of TiKV and PD.” However, I still haven’t seen more detailed explanations.

translator_bot · June 21, 2024, 11:52am

| username: hey-hoho | Original post link

Are the target tables for your parallel import tasks all the same?

translator_bot · June 21, 2024, 11:52am

| username: sjtuxpy | Original post link

If the target tables are different, then each manages its own. I don’t quite understand the scenario where the target tables are the same.

translator_bot · June 21, 2024, 11:52am

| username: hey-hoho | Original post link

If the target table is the same, why run multiple instances of Lightning together? I don’t understand what scenario this is. Doing this will definitely fail in local mode, and in TiDB mode, it will be converted to replace, which won’t improve efficiency much either.

translator_bot · June 21, 2024, 11:52am

| username: sjtuxpy | Original post link

Corresponding to the scenario in Example 1

translator_bot · June 21, 2024, 11:52am

| username: xingzhenxiang | Original post link

TiDB Lightning exited successfully.

translator_bot · June 21, 2024, 11:52am

| username: hey-hoho | Original post link

The documentation is quite clear. In parallel mode, Lightning logically separates multiple import tasks through metadata management. From this perspective, the target end of each Lightning instance is an “empty table.”