Dumpling Backup Reports Error 9005: Region is Unavailable

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dumpling备份报 Error 9005: Region is unavailable

| username: TiDBer_OB4kHrS7

[TiDB Usage Environment] Production Environment
[TiDB Version] V7.5.0
[Reproduction Path] Dumpling backup database
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
When using dumpling for backup, it reports Error 9005: Region is unavailable

| username: 像风一样的男子 | Original post link

Will executing this select statement separately also result in an error?

| username: ffeenn | Original post link

Are all the nodes functioning properly?

| username: TiDBer_OB4kHrS7 | Original post link

Executing it separately, it keeps running without any errors.

| username: TiDBer_OB4kHrS7 | Original post link

From the cluster status, all nodes are normal.

| username: 像风一样的男子 | Original post link

Generally, the “Region is unavailable” error is due to disk issues.
Take a look at this:

| username: ffeenn | Original post link

Also, check the TiKV - detail - Errors monitoring for any errors.

| username: ffeenn | Original post link

You can also check out this article, Column - Summary of Troubleshooting Region is Unavailable | TiDB Community

| username: tidb菜鸟一只 | Original post link

Check the TiDB and TiKV logs to see what caused it.

| username: TiDBer_OB4kHrS7 | Original post link

I didn’t see the tikv-ctl command.
image
If this command exists, it can only be run on the tikv node, right?

| username: TiDBer_OB4kHrS7 | Original post link

Logs in TiKV

| username: TiDBer_OB4kHrS7 | Original post link

The second backup was successful again.

| username: Kongdom | Original post link

:flushed: Lucky you, maybe the cluster was busy and couldn’t access the region the first time. We weren’t so lucky and lost data from several tables directly.

| username: 像风一样的男子 | Original post link

Keep monitoring the cluster status and the errors on each node. Also, pay attention when you see “Region is unavailable.”

| username: tidb菜鸟一只 | Original post link

This means TiKV is too busy, and the connection was dropped after waiting too long. The backup might succeed when it’s not busy.

| username: zhang_2023 | Original post link

What’s going on?

| username: TiDBer_OB4kHrS7 | Original post link

There are dozens of databases under this instance, and whenever we perform a backup, we encounter region unavailability issues. Sometimes the backup cannot be completed successfully in one go and requires multiple attempts. The command you sent yesterday to check the region was not found, so we cannot confirm whether there are any bad regions.

| username: GreenGuan | Original post link

Check the cluster status to see if the load on the cluster increased during the backup, and also check the health of the regions.

| username: TiDBer_OB4kHrS7 | Original post link

The load will definitely increase, after all, there are several terabytes of data to back up at once. How do you check the health status of the region?

| username: knull | Original post link

Theoretically, when TiDB executes a query and encounters a region-related error, it will retry by itself. If the retry times out, it will return “region is unavailable,” indicating that it has been unavailable for a long time. Therefore, it is recommended to check the status of the TiDB cluster. Additionally, I don’t know what your command is like or what the machine configuration is. If the concurrency is too high and the cluster configuration is not high, it may also lead to high cluster load and anomalies.