[tinykv] What could be the possible causes of the invalid region meta key length error?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: [tinykv] invalid region meta key length 错误的可能原因是什么

| username: sa_ka_na

When testing TestSplitRecover3B, an invalid region meta key length for key [1 3 0 0 0 0 0 0 0 1 1 255 255 255 255 255 255 255 211] [recovered] error occurred.

The scenario in which this error appears is: a peer was previously stopped and then restarted. When it restarts, it calls the loadPeers function, which retrieves the region corresponding to each peer from stable storage. The method to retrieve the region is to use the previously persisted RegionStateKey as the key to get the corresponding value, i.e., the region, from stable storage. Since the RegionStateKey is encoded, the DecodeRegionMetaKey function needs to be called to decode it. The error occurs in this function. This function checks whether the length of the decoded key matches the preset length, and if not, it throws the aforementioned error.

This error is reproducible and occurs every time the first peer is restarted.

By printing log information, it was found that [1 3 0 0 0 0 0 0 0 1 1] is the region state key corresponding to region 1. Its length is 11, which meets the specified size. However, when restarting, the region state key retrieved from stable storage becomes [1 3 0 0 0 0 0 0 0 1 1 255 255 255 255 255 255 255 211]. It seems that the data in stable storage has been corrupted.

Has anyone encountered a similar issue? What is the solution?

| username: sa_ka_na | Original post link

I found the reason. It was because the same WriteBatch was mistakenly written twice, causing the data in Badger to be corrupted.