Regarding the issue of "connect: connection refused" when exporting large amounts of data from TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于tidb导出大量数据,出现 connect: connection refused的问题

| username: 末0_0想

[TiDB Usage Environment] Testing
[TiDB Version] v6.5.0
[Reproduction Path]

tiup dumpling -u root -p'HGe34545e9' -P 4000 -h 10.18.104.156 --filetype sql -t 4 -L ./baklog.log -o ./bak -r 200000 -F256MiB



I need some guidance. When I was exporting data, it reported “connect: connection refused.” The connection to the database failed, resulting in a backup failure. Can someone explain what might be causing this? Below is the runtime log.
During the process, I monitored the CPU and memory. CPU usage was at 20%, and memory usage was at 90%.
baklog.log (33.8 KB)

| username: tidb菜鸟一只 | Original post link

Check the status of TiDB to see if it has restarted…

| username: 末0_0想 | Original post link

How can I check if TiDB has restarted? I can only see that there is no performance data for TiDB for a period of time on the monitoring. I also didn’t see any errors in the logs on the corresponding TiDB server 156 machine.

However, in such cases of export interruption, I have repeated the experiment 6-8 times, and from the monitoring, it doesn’t look like TiDB has restarted.

| username: Kongdom | Original post link

You can view it in the Dashboard of the 2379 site. You can also use the tiup cluster display command to check.

| username: 末0_0想 | Original post link

I researched and found that when performing a tiup dumpling backup, the TiDB node would go down. However, after checking the TiDB node logs, I found no restart logs. Instead, the server became inaccessible during the backup process. I suspect it might be due to IO or too many open files causing the server port to become unresponsive. This is just a guess. Could an expert guide me on how to verify this situation?

tiup dumpling -u root -p'HGe34545e9' -P 4000 -h 10.18.104.162 --filetype sql -t 1 -L ./baklog.log -o ./bak -r 2000 -F32MiB

| username: Kongdom | Original post link

How about checking this out?
http://10.18.104.162:2379/dashboard/#/overview

| username: 末0_0想 | Original post link

http://10.18.104.162:2379/dashboard is not accessible.

| username: tony5413 | Original post link

Why did 162 go down? Did you check the logs?

| username: 裤衩儿飞上天 | Original post link

You can also check the operating system logs.

| username: 末0_0想 | Original post link

The only error log that can be seen now is this, and the service has crashed and cannot be restarted.
[2023/05/18 09:21:47.987 +08:00] [INFO] [sst_importer.rs:442] [“shrink cache by tick”] [“retain size”=0] [“shrink size”=0]
[2023/05/18 09:21:52.691 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:21:57.693 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:21:57.988 +08:00] [INFO] [sst_importer.rs:442] [“shrink cache by tick”] [“retain size”=0] [“shrink size”=0]
[2023/05/18 09:22:02.694 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:22:07.695 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:22:07.989 +08:00] [INFO] [sst_importer.rs:442] [“shrink cache by tick”] [“retain size”=0] [“shrink size”=0]
[2023/05/18 09:22:12.696 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:22:17.697 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:22:17.991 +08:00] [INFO] [sst_importer.rs:442] [“shrink cache by tick”] [“retain size”=0] [“shrink size”=0]
[2023/05/18 09:22:22.698 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:22:27.700 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]
[2023/05/18 09:22:27.991 +08:00] [INFO] [sst_importer.rs:442] [“shrink cache by tick”] [“retain size”=0] [“shrink size”=0]
[2023/05/18 09:22:32.701 +08:00] [WARN] [errors.rs:155] [“backup stream meet error”] [verbose_err=“Etcd(GRpcStatus(Status { code: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", source: None }))”] [err=“Etcd meet error grpc request error: status: Unknown, message: "Service was not ready: buffered service failed: load balancer discovery error: transport error: transport error", details: , metadata: MetadataMap { headers: {} }”] [context=“failed to get backup stream task”]

| username: Min_Chen | Original post link

Hello,

You can check if the TiDB server that disconnected in TiDB - Server - Uptime has restarted.
You can execute dmesg -T | grep tidb-server in the operating system to see if an OOM occurred.
How much memory does the server have? Based on experience, -t 4 should not cause the TiDB server to crash.

| username: 末0_0想 | Original post link

Hello, I followed your method to check, but I didn’t find any results as shown in the picture:

image

The statement I used to back up the data is as follows:
mydumper -u root -p ‘HGYe9’ -P 4000 -h 10.18.104.156 --regex ‘^(?!(mysql|test|information_schema|METRICS_SCHEMA|performance_schema|sys))’ -G -R -E -c -K -r 20000 -t 1 -F 32 --no-schemas -o ./bak -L ./bak.log


Additionally, I found that the server 156 seems to have crashed.

My server has 8G of memory and is a virtual machine.

I found that once the backup is performed, the server becomes unreachable.