Error Occurred When Querying TiDB with Trino

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Trino查询Tidb出现以下错误

| username: TIDB救我狗命

[TiDB Usage Environment] Production Environment / Testing / PoC
Production Environment
[TiDB Version]
6.0.1
[Reproduction Path] What operations were performed when the issue occurred
Trino querying TiDB
[Encountered Issue: Issue Phenomenon and Impact]

| username: ffeenn | Original post link

display Check the cluster status.

| username: TIDB救我狗命 | Original post link

A new error has appeared again.

| username: TIDB救我狗命 | Original post link

Currently, I can’t see the cluster status. I might need to wait for the operations team to check it. A few days ago, we upgraded the TiDB cluster, and now the situation is that some tables can be queried, while others are experiencing the aforementioned issue.

| username: TIDB救我狗命 | Original post link

Suddenly, everything can be queried again… so strange.

| username: TIDB救我狗命 | Original post link

TiDB is using version 6.0.1

| username: ffeenn | Original post link

Suspect that TiKV has crashed and there is data corruption. Need to check the specific cluster status.

| username: TIDB救我狗命 | Original post link

Yesterday, the issue was queried at 00:16, and then TiKV restarted once at 00:31. This morning, I encountered a query failure at 09:12, but between 09:12 and 09:27, some queries succeeded while others failed. Then, TiKV restarted once at 09:27. The timing doesn’t seem to match up.

| username: ffeenn | Original post link

Take a detailed look at the logs of TiDB and TiKV to determine the cause of the failure. When recovering, back up the data and try to scale down this KV node and then scale up a new one.

| username: TIDB救我狗命 | Original post link

I asked the operations team, and the restart was due to an OOM (Out of Memory) event.

| username: TIDB救我狗命 | Original post link

The operations team said it seems like the page size is insufficient.

| username: TIDB救我狗命 | Original post link

I found more detailed logs.

| username: WalterWj | Original post link

You have quite a few regions. Finally, adjust the heartbeat and enable region merge and cross-table region merge.

If you find that the merge is not happening, you need to increase the scheduling-related and merge-related parameters.

| username: TIDB救我狗命 | Original post link

Oh, okay, but it seems that the issue I’m encountering is that the leader cannot be found… So strange.

| username: TIDB救我狗命 | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: WalterWj | Original post link

It might be due to scheduling issues. Follow my suggestion to reduce the pressure on TiKV first; excessive heartbeats are a problem.