Every 10 minutes my in-flight stale reads fail

emchristiansen · April 7, 2023, 4:00pm

Hey TiDB folks,

I hoping you can help with a puzzling problem: every ~10 minutes my in-flight stale reads fail.

Here’s a dashboard graph illustrating it (focus on “Failed Queries”):

The clients get messages like this from TiDB when the queries fail:

Query Error: error returned from database: 9006 (HY000): GC life time is shorter than transaction duration, transaction starts at 2023-04-07 13:27:42.135 +0000 UTC, GC safe point is 2023-04-07 13:28:22.687 +0000 UTC

Interestingly, this only happens for these two queries, which are both stale reads.
Here’s the first one:

SELECT
  `pkh`
FROM
  `allocated_account` AS of timestamp `tidb_bounded_staleness` (NOW () - INTERVAL ? SECOND, NOW ())
WHERE
  `pkh4` IN (...)

Here’s the second:

SELECT
  `sk`
FROM
  `sk` AS of timestamp `tidb_bounded_staleness` (NOW () - INTERVAL ? SECOND, NOW ())
WHERE
  `pkh4` IN (...)

Details:

I’m using 300 seconds as my max staleness and the GC time is set at the default (10 minutes). The ... in the clause WHERE pkh4 in (...) is around 1K elements, each of size 4 bytes.
I have configured TiDB to send reads only to local followers as is detailed in this thread.
I believe my client maintains at most 8 in-flight queries at any given point, and they all appear to be processed promptly.

Do you have any idea what might be happening?
It seems like it is clearly related to the GC timeout, but I haven’t observed any client-side slowdown consistent with super-slow queries exceeding 10 minutes.

I’m attaching the TiDB Dashboard Diagnosis Report to provide more details.

Thanks!

TiDB Dashboard Diagnosis Report.html (619.6 KB)

emchristiansen · April 10, 2023, 5:22pm

I solved the problem by setting the max connection lifetime to 60 seconds in my client.
I used SeaORM and the relevant parameter is called max_lifetime.

The final correct connection logic (in Rust with SeaORM) is:

let url = format!("mysql://root:root@{}:{}/play", host, port);
let mut options = ConnectOptions::new(url);
options.max_lifetime(Duration::from_secs(60));
let db = sea_orm::Database::connect(options).await?;

Note there is definitely a bug in either SeaORM or TiDB, because omitting options.max_lifetime(Duration::from_secs(60)); causes periodic failures as described above.

winkyao · April 12, 2023, 3:35am

emchristiansen:

Query Error: error returned from database: 9006 (HY000): GC life time is shorter than transaction duration, transaction starts at 2023-04-07 13:27:42.135 +0000 UTC, GC safe point is 2023-04-07 13:28:22.687 +0000 UTC

Thanks for your feedback, I will ask TiDB developer to deep dive into this issue further.

winkyao · April 12, 2023, 6:00am

Could you provide the TiDB log(all or containing GC leader worker) that covers the time range of the error request and a full GC interval of 10 minutes?