How to Test RPO (Recovery Point Objective)

translator_bot · June 20, 2024, 9:29pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: RPO测试是如何测试的

| username: xiaoqiao

Has anyone looked into how to accurately test the RPO (Recovery Point Objective) for databases (MySQL, TiDB)?

translator_bot · June 20, 2024, 9:29pm

| username: xiaoqiao | Original post link

The extreme test of the specific definition of RPO=0 as commonly mentioned.

translator_bot · June 20, 2024, 9:29pm

| username: TIDB-Learner | Original post link

This depends on the disaster recovery configuration, such as two locations and three centers, or three locations and five centers. Achieving RTO and RPO of 0 is very easy.

translator_bot · June 20, 2024, 9:29pm

| username: TiDBer_JUi6UvZm | Original post link

RPO, the number of records lost when a failure occurs. Currently, most domestic databases claim an RPO of 0, meaning no data will be lost even when a failure occurs. My basic understanding is that this is ensured through the Raft algorithm to maintain data consistency. That is, when the original leader fails, writes are temporarily blocked until a new leader is elected. Once the new leader is in place, the blocked writes are directed to the new leader. In this scenario, the client does not report an error but experiences a brief pause (the time it takes to switch leaders). Alternatively, during the leader election process, client writes may fail (failure does not mean data loss), and the client can keep retrying until the new leader is elected and the retry succeeds.

Based on this understanding, I think we can design a test as follows:

The test program concurrently writes records to the database. Once a write is successful, the test program also writes the record to a local file (note the concurrency).
During the concurrent execution of the test program, intentionally perform network disconnection tests on the cluster (directly unplug the network cable, and later you can also perform power-off tests).
Check whether the records in the local file match the records in the database under various stress test scenarios. If they do not match, how many records differ? When RPO is 0, the two should be essentially the same.

This is my basic understanding. Please correct me if there are any inaccuracies.

translator_bot · June 20, 2024, 9:29pm

| username: DBAER | Original post link

RPO refers to whether data is lost in various abnormal scenarios, such as one minute, five minutes, etc.

translator_bot · June 20, 2024, 9:29pm

| username: xiaoqiao | Original post link

Testing…

translator_bot · June 20, 2024, 9:29pm

| username: xiaoqiao | Original post link

translator_bot · June 20, 2024, 9:29pm

| username: dba远航 | Original post link

RPO, the recovery point when a failure occurs.

translator_bot · June 20, 2024, 9:29pm

| username: Swan | Original post link

The unit of RPO is seconds (s), so it cannot be the number of lost records mentioned in the answer.

translator_bot · June 20, 2024, 9:29pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.