How to Ensure Ingested Data Does Not Conflict with Incremental Data When Creating Indexes Asynchronously?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 异步创建索引时,ingest数据如何保证与增量数据不冲突的?

| username: Maverick

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v7.0.0
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
I have some questions while reading the “TiDB Adding Index Acceleration” markdown. For this acceleration construction mode, if the ingested SST conflicts with the RocksDB bottom level and cannot be inserted, will this issue exist? If it does, how is it currently handled?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: Billmay表妹 | Original post link

Please provide the link to the content you read. Also, was it written by the official source?

| username: Billmay表妹 | Original post link

In TiDB, during the asynchronous index creation process, if there are simultaneous data write or update operations, it may lead to conflicts between index data and incremental data. To avoid this situation, TiDB adopts the following two methods to ensure the correctness of index data:

  1. During the asynchronous index creation process, TiDB writes new data to both the original table and the index table. This ensures that the data in the index table remains consistent with the data in the original table. Additionally, TiDB records which data has already been written to the index table so that this data can be removed from the original table once the index creation is complete.

  2. During the asynchronous index creation process, TiDB writes incremental data to a temporary table instead of directly to the original table. This avoids conflicts between index data and incremental data. Once the index creation is complete, TiDB writes the data from the temporary table to the index table and removes this data from the original table.

It is important to note that during the asynchronous index creation process, there may be issues with query result inconsistency. This is because, before the index creation is complete, queries may access both the original table and the index table, and the data in these two tables may be inconsistent. To avoid this situation, TiDB automatically selects the correct table for queries to ensure the correctness of the query results.

In summary, TiDB ensures the correctness of index data by writing new data to both the original table and the index table or by writing incremental data to a temporary table. Additionally, it automatically selects the correct table for queries to avoid issues with query result inconsistency.

| username: Maverick | Original post link

The official documentation: tidb/docs/design/2022-06-07-adding-index-acceleration.md at master · pingcap/tidb · GitHub

| username: Maverick | Original post link

I understand his process for accelerating index creation as follows:

  1. For existing data, create a snapshot, pull data from TiKV, create SST, and import it into TiKV.
  2. For incremental data, write it according to normal data I/O.

My question is: Suppose the incremental index data, due to RocksDB’s own compaction to the last level, overlaps with the SST of the existing data, causing an insertion failure. Is there such a problem? If so, how is it currently resolved?