What impact does NTP clock synchronization anomalies (ahead or behind) have on cluster operation? What are the risks of directly fixing it to the current time?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ntp时钟同步异常(提前、落后)对集群运行有何影响?直接修复到当前时刻有何风险?

| username: Jellybean

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】Any
【Encountered Issues: Problem Phenomenon and Impact】

TiDB is a distributed database system that requires time synchronization between nodes to ensure the linear consistency of transactions under the ACID model. The common solution for time synchronization is to use NTP services, which can ensure time synchronization between nodes through the pool.ntp.org time service on the internet, or by using an offline environment with a self-built NTP service.

Additionally, the official documentation frequently mentions the importance of NTP clocks:

  1. Using pd-recover to repair metadata:
    pd-recover does not modify TSO. Therefore, before performing this step, ensure that the local time is later than the time of the failure and confirm that the NTP clock synchronization service was enabled between PD components before the failure. If not, you need to adjust the local clock to a future time to ensure that TSO does not roll back.”
  2. Using the AS OF TIMESTAMP statement to read historical version data in TiDB with the Stale Read feature:
    “When using Stale Read, you need to deploy NTP services for TiDB and PD nodes to prevent the timestamp specified by TiDB from exceeding the current latest TSO allocation progress (such as a timestamp a few seconds later) or falling behind the GC safe point timestamp. When the specified timestamp exceeds the service range, TiDB will return an error.”

When clock issues occur:

  1. If it is the tidb-server, there might be anomalies when executing commands like select now() to get the current system time?
  2. If it is the PD, the component closely related to the clock in the cluster is the PD leader. Its TSO includes a physical clock and a logical clock. TSO is used for transaction IDs, data commit times, and other key contents. If the PD leader’s machine clock has issues and is adjusted back from the future to the present, there might be TSO duplication issues? The IDs of transactions before and after might be the same, leading to data confusion?
  3. If it is a TiKV machine, are there similar risk points?
| username: 春风十里 | Original post link

This is a good question and worth deep consideration. Although I am not particularly proficient yet, reading the official documentation’s description of TSO:

The following example shows the binary details of the TSO timestamp:

0000011000101000111000010001011110111000110111000000000000000100  ← This value is the binary form of 443852055297916932
0000011000101000111000010001011110111000110111                    ← **The first 46 bits are the physical timestamp**
                                              000000000000000100  ← The last 18 bits are the logical timestamp

The TSO timestamp consists of two parts:

  • Physical timestamp: UNIX timestamp since January 1, 1970, in milliseconds.
  • Logical timestamp: Incremental counter used in situations where multiple timestamps are needed within one millisecond, or certain events may trigger clock process reversal. In these cases, the physical timestamp remains unchanged, while the logical timestamp keeps incrementing. This mechanism ensures the integrity of the TSO timestamp, ensuring that the timestamp always increments and does not roll back.

You can view the TSO timestamp more deeply through SQL statements, as shown in the example below:

SELECT @ts, UNIX_TIMESTAMP(NOW(6)), (@ts >> 18)/1000, FROM_UNIXTIME((@ts >> 18)/1000), NOW(6), @ts & 0x3FFFF\G
*************************** 1. row ***************************
                            @ts: 443852055297916932
         UNIX_TIMESTAMP(NOW(6)): 1693161835.502954
               (@ts >> 18)/1000: 1693161221.6870
FROM_UNIXTIME((@ts >> 18)/1000): 2023-08-27 20:33:41.6870
                         NOW(6): 2023-08-27 20:43:55.502954
                  @ts & 0x3FFFF: 4
1 row in set (0.00 sec)

The first 46 bits are the physical timestamp So does time rollback mean that the TSO will become smaller, thus causing duplication?
If so, it feels quite dangerous.
This reminds me of a common bug in Oracle 10g, where time rollback could potentially cause RAC nodes to restart.

| username: tidb菜鸟一只 | Original post link

It can be postponed; moving it forward carries risks.

| username: wangccsy | Original post link

It may cause data synchronization risks.

| username: oceanzhang | Original post link

If the time difference is not significant, you can stop the system to adjust the time.

| username: 哈喽沃德 | Original post link

It’s fine to delay it, but there will be problems if it’s moved forward.

| username: 随缘天空 | Original post link

NTP clock anomalies can cause computer network systems to become unstable, which in turn can affect the proper operation of various applications. This is especially true for data backup and recovery; if the clocks are not synchronized, data inconsistency across nodes can easily lead to data recovery failures.

| username: chenhanneu | Original post link

If the time is adjusted backward, the TSO will not move backward. For example, if the time is adjusted from 18:00 to 17:30, the TSO will continue to increase slowly from 18:00. It can generate 260 million timestamps per second, and when the timestamps are exhausted, it will push forward by the smallest time unit. It will not move forward with the physical time as usual. When the time is restored from 17:30 to the point where the TSO is used, for example, at 18:01:00, the TSO will continue to grow normally.

| username: Jellybean | Original post link

Moreover, the storage layer has already stored a lot of data, and the commit information, various IDs, and keys may all contain TSO information. After a rollback, it could lead to duplicate content and some abnormal issues. If it involves critical system metadata, it’s uncertain whether it might directly cause the cluster to become unavailable. These are the thoughts that came to mind, so I raised this issue to discuss with everyone.

| username: 春风十里 | Original post link

I see that TiDB’s documentation has already considered the situation of time rollback. The physical time will only be updated if it is greater than the old physical time. So theoretically, the physical time should not roll back.

| username: 春风十里 | Original post link

Tested in a single-machine test environment, regardless of whether the time is adjusted forward or backward, the TSO is always increasing.


MySQL [(none)]> select version();
+--------------------+
| version()          |
+--------------------+
| 8.0.11-TiDB-v7.5.0 |
+--------------------+
1 row in set (0.01 sec)

MySQL [(none)]> 

MySQL [(none)]> BEGIN; SET @ts := @@tidb_current_ts; ROLLBACK;
Query OK, 0 rows affected (0.21 sec)

Query OK, 0 rows affected (0.55 sec)

Query OK, 0 rows affected (0.02 sec)

MySQL [(none)]> SELECT @ts;
+--------------------+
| @ts                |
+--------------------+
| 446974111762087941 |
+--------------------+
1 row in set (0.20 sec)

MySQL [(none)]> SELECT @ts, UNIX_TIMESTAMP(NOW(6)), (@ts >> 18)/1000, FROM_UNIXTIME((@ts >> 18)/1000), NOW(6), @ts & 0x3FFFF\G
*************************** 1. row ***************************
                            @ts: 446974111762087941
         UNIX_TIMESTAMP(NOW(6)): 1705070942.989248
               (@ts >> 18)/1000: 1705070921.9440
FROM_UNIXTIME((@ts >> 18)/1000): 2024-01-12 22:48:41.9440
                         NOW(6): 2024-01-12 22:49:02.989248
                  @ts & 0x3FFFF: 5
1 row in set (0.75 sec)

MySQL [(none)]> exit
Bye
[root@localhost ~]# date -s 20240122
Mon Jan 22 00:00:00 CST 2024
[root@localhost ~]# mylogin
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 507513162
Server version: 8.0.11-TiDB-v7.5.0 TiDB Server (Apache License 2.0) Community Edition, MySQL 8.0 compatible

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> BEGIN; SET @ts := @@tidb_current_ts; ROLLBACK;
Query OK, 0 rows affected (0.29 sec)

Query OK, 0 rows affected (0.11 sec)

Query OK, 0 rows affected (0.11 sec)

MySQL [(none)]> SELECT @ts, UNIX_TIMESTAMP(NOW(6)), (@ts >> 18)/1000, FROM_UNIXTIME((@ts >> 18)/1000), NOW(6), @ts & 0x3FFFF\G
*************************** 1. row ***************************
                            @ts: 447179080218968065
         UNIX_TIMESTAMP(NOW(6)): 1705852821.146757
               (@ts >> 18)/1000: 1705852814.5560
FROM_UNIXTIME((@ts >> 18)/1000): 2024-01-22 00:00:14.5560
                         NOW(6): 2024-01-22 00:00:21.146757
                  @ts & 0x3FFFF: 1
1 row in set (0.13 sec)

MySQL [(none)]> exit
Bye
[root@localhost ~]# date -s 20230101
Sun Jan  1 00:00:00 CST 2023
[root@localhost ~]# mylogin
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 507513164
Server version: 8.0.11-TiDB-v7.5.0 TiDB Server (Apache License 2.0) Community Edition, MySQL 8.0 compatible

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> BEGIN; SET @ts := @@tidb_current_ts; ROLLBACK;
Query OK, 0 rows affected (0.08 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

MySQL [(none)]> SELECT @ts, UNIX_TIMESTAMP(NOW(6)), (@ts >> 18)/1000, FROM_UNIXTIME((@ts >> 18)/1000), NOW(6), @ts & 0x3FFFF\G
*************************** 1. row ***************************
                            @ts: 447179087651799401
         UNIX_TIMESTAMP(NOW(6)): 1672502421.509376
               (@ts >> 18)/1000: 1705852842.9100
FROM_UNIXTIME((@ts >> 18)/1000): 2024-01-22 00:00:42.9100
                         NOW(6): 2023-01-01 00:00:21.509376
                  @ts & 0x3FFFF: 361
1 row in set (0.01 sec)

MySQL [(none)]> select now();
+---------------------+
| now()               |
+---------------------+
| 2023-01-01 00:02:54 |
+---------------------+
1 row in set (0.00 sec)

MySQL [(none)]> exit
[root@localhost ~]# date -s 20240122
Mon Jan 22 00:00:00 CST 2024
[root@localhost ~]# date -s 22:55
Mon Jan 22 22:55:00 CST 2024
[root@localhost ~]# mylogin
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 507513206
Server version: 8.0.11-TiDB-v7.5.0 TiDB Server (Apache License 2.0) Community Edition, MySQL 8.0 compatible

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> BEGIN; SET @ts := @@tidb_current_ts; ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

MySQL [(none)]> SELECT @ts, UNIX_TIMESTAMP(NOW(6)), (@ts >> 18)/1000, FROM_UNIXTIME((@ts >> 18)/1000), NOW(6), @ts & 0x3FFFF\G
*************************** 1. row ***************************
                            @ts: 447200705026457602
         UNIX_TIMESTAMP(NOW(6)): 1705935308.874718
               (@ts >> 18)/1000: 1705935306.6500
FROM_UNIXTIME((@ts >> 18)/1000): 2024-01-22 22:55:06.6500
                         NOW(6): 2024-01-22 22:55:08.874718
                  @ts & 0x3FFFF: 2
1 row in set (0.00 sec)

| username: 小龙虾爱大龙虾 | Original post link

When there is a time rollback in PD, the TSO will not roll back but will advance the logical part. If there are too many TSO requests in a short period, it may lead to insufficient logical bits. At this point, it is necessary to wait for an interval to update the physical part, which will result in higher latency for TSO requests.

Testing is needed for TiKV and PD.

| username: TIDB-Learner | Original post link

For an in-depth understanding of this issue, it is recommended to read books related to distributed systems. For example, the chapters on time, clocks, and event ordering.

| username: 春风十里 | Original post link

Although time rollback will not cause TSO rollback, it may still trigger some logical issues. For example, in the following experiment, I manually adjusted the time to January 22, 2024 (the correct date of the experiment was January 14, 2024), and then rolled it back. After noticing a significant increase in TSO time, I encountered an error when performing a snapshot backup.

Reference Experiment

MySQL [(none)]> SELECT TIDB_PARSE_TSO(447201033313058816);
+------------------------------------+
| TIDB_PARSE_TSO(447201033313058816) |
+------------------------------------+
| 2024-01-22 23:15:58.964000         |
+------------------------------------+
1 row in set (0.00 sec)

MySQL [(none)]> exit
Bye
[root@localhost ~]# date
Sun Jan 14 22:53:16 CST 2024
[root@localhost ~]# tiup br backup full \
>     --pd "192.168.0.100:2379" \
>     --backupts '2024-01-14 22:46:00' \
>     --storage "local:///tmp/backup5" \
>     --ratelimit 128 \
>     --log-file backupfull20240114.log
tiup is checking updates for component br ...
Starting component `br`: /root/.tiup/components/br/v7.5.0/br backup full --pd 192.168.0.100:2379 --backupts 2024-01-14 22:46:00 --storage local:///tmp/backup5 --ratelimit 128 --log-file backupfull20240114.log

Detail BR log in backupfull20240114.log 
[2024/01/14 22:53:41.761 +08:00] [WARN] [backup.go:312] ["setting `--ratelimit` and `--concurrency` at the same time, ignoring `--concurrency`: `--ratelimit` forces sequential (i.e. concurrency = 1) backup"] [ratelimit=134.2MB/s] [concurrency-specified=4]
[2024/01/14 22:53:46.255 +08:00] [INFO] [collector.go:77] ["Full Backup failed summary"] [total-ranges=0] [ranges-succeed=0] [ranges-failed=0]
Error: GC safepoint 447201033313058816 exceed TS 447019367792640000: [BR:Backup:ErrBackupGCSafepointExceeded]backup GC safepoint exceeded
[root@localhost ~]# 
| username: Jellybean | Original post link

It looks like the GC safepoint has been advanced to a future time, 2024-01-22 15:15:58.964000, which is earlier than the current time 2024-01-14 14:46:00.000000, causing the backup exception issue.

You can check the GC time of the entire cluster system to see if it has been advanced to a future time point.

| username: YuchongXU | Original post link

You can configure NTP to gradually adjust the time.

| username: ShawnYan | Original post link

There might be an issue with TSO duplication.

The hit rate issue poses a risk.

NTPD synchronization of PD should also be a monitoring item.

| username: Lily2025 | Original post link

There is a known issue with the rollback: sysbench failed with schema out of date when we try to push the time of PD-server forward 5 minutes · Issue #38248 · pingcap/tidb · GitHub

| username: ShawnYan | Original post link

It’s too difficult, there are problems both before and after.

| username: Jellybean | Original post link

Yes, it feels like if this issue is not controlled well, it can easily cause problems.