Error Using Database: No Available Connections

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用数据库报错 no available connections

| username: TiDBer_eyHUd5pk

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
[Encountered Issue: Symptoms and Impact]
Error connecting to the database: no available connections (sometimes TIKV timeout), but the cluster is normal.

Connection error:

Cluster information:

tidb.log:

tikv.log:

| username: DBRE | Original post link

This should be an error reported by the program, you can first check the connection pool configuration.

| username: tidb菜鸟一只 | Original post link

Can the MySQL command connect on a TiDB machine?

| username: TiDBer_pkQ5q1l0 | Original post link

Is your cluster deployed as single nodes? Is it usable after deployment?

| username: TiDBer_eyHUd5pk | Original post link

It was working fine after the initial deployment, but suddenly it stopped working. It’s a cluster composed of two machines.

| username: TiDBer_eyHUd5pk | Original post link

Because the deployed machine is an intranet machine and lacks various dependencies, installing MySQL is very troublesome, so I haven’t tested it. If necessary, you can give it a try.

| username: TiDBer_eyHUd5pk | Original post link

How do you specifically look at it? I don’t quite understand this area.

| username: TiDBer_eyHUd5pk | Original post link

Using a direct connection to the server also results in this situation, but TiKV is on another server.

| username: tidb菜鸟一只 | Original post link

Are the resources of the TiKV host sufficient? I see that some queries can return results. Is it the larger SQL queries that can’t be executed?

| username: 我是咖啡哥 | Original post link

Yes, it’s very likely that the load is too high, causing occasional timeouts.

| username: liuis | Original post link

Judging by the performance, is the connection pool full? How is the cluster performance?

| username: TiDBer_eyHUd5pk | Original post link

The capacity of the disk where TiKV is located is sufficient, and I can’t even open the table using nativcat right now. This SQL isn’t large, and it used to work normally before.

| username: TiDBer_eyHUd5pk | Original post link

Now nothing can connect at all, so the issue of high load doesn’t exist anymore, right?

| username: 我是咖啡哥 | Original post link

Check the load in the Dashboard.

| username: TiDBer_eyHUd5pk | Original post link

Sorry, which one is the load? The latency displayed here is very high, but I’m not sure if it’s an issue with the cluster or the network. If it’s a network issue, besides the latency shown on this dashboard, how can I prove it’s a network problem? Because when I ping directly, the latency is less than 1ms.

| username: 我是咖啡哥 | Original post link

Uh, is your delay 2 minutes? This definitely won’t work.

| username: 我是咖啡哥 | Original post link

You didn’t get the CPU usage data, did you? It’s not that there’s no load, right?
Just log in to your two servers and check the load situation. You can use top or dstat.
See where it’s slow; normally, the latency should be less than 100ms.

| username: TiDBer_eyHUd5pk | Original post link

Here are the CPU usage rates for the two servers. Although one has a higher usage rate, it is not the TiKV server. Additionally, I don’t know why the first machine has so many PD-servers and TiDB-servers, and I don’t dare to stop them since this is a production environment.

| username: CuteRay | Original post link

With Grafana, you can extract several items from the Grafana overview interface.
PS: Deploying only one TiKV in a production environment is highly discouraged; the minimum recommendation is three TiKVs.

| username: TiDBer_eyHUd5pk | Original post link

Sorry, Grafana is too professional for me, I don’t know how to use it or add an overview. Although there is only one TiKV in the current environment, more will be added later.