Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: ntp时钟同步异常(提前、落后)对集群运行有何影响?直接修复到当前时刻有何风险?

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】Any
【Encountered Issues: Problem Phenomenon and Impact】
TiDB is a distributed database system that requires time synchronization between nodes to ensure the linear consistency of transactions under the ACID model. The common solution for time synchronization is to use NTP services, which can ensure time synchronization between nodes through the pool.ntp.org time service on the internet, or by using an offline environment with a self-built NTP service.
Additionally, the official documentation frequently mentions the importance of NTP clocks:
- Using pd-recover to repair metadata:
“pd-recover
does not modify TSO. Therefore, before performing this step, ensure that the local time is later than the time of the failure and confirm that the NTP clock synchronization service was enabled between PD components before the failure. If not, you need to adjust the local clock to a future time to ensure that TSO does not roll back.” - Using the
AS OF TIMESTAMP
statement to read historical version data in TiDB with the Stale Read feature:
“When using Stale Read, you need to deploy NTP services for TiDB and PD nodes to prevent the timestamp specified by TiDB from exceeding the current latest TSO allocation progress (such as a timestamp a few seconds later) or falling behind the GC safe point timestamp. When the specified timestamp exceeds the service range, TiDB will return an error.”
When clock issues occur:
- If it is the tidb-server, there might be anomalies when executing commands like select now() to get the current system time?
- If it is the PD, the component closely related to the clock in the cluster is the PD leader. Its TSO includes a physical clock and a logical clock. TSO is used for transaction IDs, data commit times, and other key contents. If the PD leader’s machine clock has issues and is adjusted back from the future to the present, there might be TSO duplication issues? The IDs of transactions before and after might be the same, leading to data confusion?
- If it is a TiKV machine, are there similar risk points?