[TiDB Usage Environment]
Production Environment
[TiDB Version]
v6.1.0
[Encountered Problem: Phenomenon and Impact]
Currently using ticdc to synchronize data to Kafka, where data in Kafka is only retained for 7 days by default. The business consumption program had an abnormal stop for more than 7 days, causing the messages in Kafka to be deleted. I want to find the oldest TSO currently retained in TiDB and use ticdc to re-capture from this oldest TSO to minimize data loss as much as possible.
I would like to know how to check the information of the oldest TSO currently retained in TiDB.
In this example, the --start-ts parameter specifies the starting TSO as 425355555555555555, and TiCDC will start re-fetching data from this TSO.
I hope the above information helps you solve your problem. If you need more assistance, please provide more detailed information, and I will do my best to help you.
First, connect to the TiDB console or use a MySQL client to execute the following command to check the current status of the TiDB cluster:
SHOW VARIABLES LIKE 'tikv_gc_life_time';
This command will display the value of tikv_gc_life_time, which represents the maximum garbage collection (GC) lifetime of data in TiKV (the distributed storage engine). By default, this value is 10m (10 minutes).
Based on the value of tikv_gc_life_time obtained in the previous step, calculate the timestamp of the oldest TSO. Assuming the value of tikv_gc_life_time is 10m, the timestamp of the oldest TSO can be calculated as the current time minus 10 minutes. You can use the following command to get the current time:
SELECT NOW();
Calculate the timestamp of the oldest TSO and find the data changes before this timestamp. You can use the TiCDC tool to resynchronize these data changes to Kafka. The specific steps are as follows:
a. Install the TiCDC tool: Install the corresponding version of the TiCDC tool according to the TiDB version.
b. Configure TiCDC: Edit the TiCDC configuration file (ticdc.toml) to specify the target for data synchronization, such as Kafka.
c. Start TiCDC: Use the TiCDC tool to start the data synchronization task.
d. Specify the start time: When starting the TiCDC data synchronization task, you can specify the start synchronization time through parameters, setting this time to the calculated timestamp of the oldest TSO. For example, assuming the timestamp of the oldest TSO is “2023-07-25 12:00:00”, you can use the following parameters when starting TiCDC:
This will start synchronizing data changes to Kafka from the specified TSO timestamp.
Please note that using the TiCDC tool may involve some complex configurations and operations, so make sure you have a sufficient understanding of TiCDC operations before executing them, and test in a test environment. Additionally, to prevent data loss, it is recommended to back up TiDB data before performing operations.