Frequently Reports "Wait Response is Cancelled"

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 经常报wait response is cancelled

| username: HACK

【TiDB Usage Environment】Production, Testing, Research
【TiDB Version】
【Encountered Problem】

Checking the tidb.log, I often find the following WARN messages:
2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx20160] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:04:26.993 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx20161] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:07:47.716 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:07:47.751 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]
[2022/07/21 22:07:47.751 +08:00] [WARN] [client_batch.go:625] [“wait response is cancelled”] [to=xxx.xxx.xxx.xxx:20160] [cause=“context canceled”]

The addresses to=xxx.xxx.xxx.xxx are all TiKV node addresses. What is the issue here?

【Reproduction Path】What operations were performed to encounter the issue
【Problem Phenomenon and Impact】

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: HACK | Original post link

I saw this post, is it related to dumpling:

Has anyone encountered this problem?

| username: 张雨齐0720 | Original post link

wait response is cancelled. The request was sent to TiKV but timed out without receiving a response from TiKV. It is necessary to investigate the response time of the corresponding TiKV address and the PD and KV logs of the corresponding Region at that time to determine why KV did not respond in a timely manner.”

| username: HACK | Original post link

After some tracking, I found that the warning message stopped appearing once I stopped the dumpling task. I guess it’s a bug.

| username: db_user | Original post link

It cannot be considered a bug. Observe the cluster pressure when using dumpling; if the load is high and the IOPS are high, then the response will naturally be slow. Check the status of KV and PD. High load and slow response is a normal situation.

| username: HACK | Original post link

My database is very small, and the following is the time record required for the backup:

===== Start backing up database [dzdz] =====

[2022/07/21 01:00:01.910 +08:00] [INFO] [config.go:158] [“detect server type”] [type=TiDB]

[2022/07/21 01:00:01.910 +08:00] [INFO] [config.go:176] [“detect server version”] [version=4.0.6]

[2022/07/21 01:00:50.422 +08:00] [INFO] [main.go:216] [“dump data successfully, dumpling will exit now”]

It was done in less than a minute.

| username: HACK | Original post link

We have this WARN message at the backup time point of all our TiDB clusters using Dumpling.

| username: db_user | Original post link

Please export the desensitized logs of PD and TiKV for the corresponding time period.

| username: HACK | Original post link

I have quite a few TiKV nodes here. I downloaded the logs of one instance. Can you take a look and see if you can find any clues from it?

Additional information:
My dumpling scheduled task starts at 1 AM every day. It connects to the 4000 port of a TiDB server instance from the control machine and then performs the backup.

The warning messages are appearing on that TiDB server node with the 4000 port.

| username: cs58_dba | Original post link

Is it possible that there is a large transaction? I have encountered this situation with MySQL before, where the FTWL couldn’t acquire the lock, causing issues with the backup.

| username: HACK | Original post link

It shouldn’t be. This warning message appears every time a backup is made.

| username: cs58_dba | Original post link

Try backing up at a different time, or manually back up to see if the error can be reproduced.

| username: HACK | Original post link

Hello, is there any new discovery on this?

| username: db_user | Original post link

Can you compress a portion of the TiKV logs or PD logs using zip? My company computer can’t decompress rar files.

| username: HACK | Original post link

I have converted it.

| username: db_user | Original post link

Hello, could you please extract the TiKV logs around 2022/07/21 22:04:26? The files above don’t have logs around this time. Also, could you confirm if there are only three PD instances? If not, please also extract the PD leader’s logs. Additionally, please check the resources during the backup phase, including IO and network, to see if they are at their limits. If they are not at their limits, it could also be a locking issue.

| username: HACK | Original post link

Hello, can you please extract the TiKV logs around 2022/07/21 22:04:26? The file above doesn’t have logs around that time.

Let me extract it for you, please wait a moment.

Also, can you confirm if there are only three PD nodes? If not, please extract the PD leader’s logs as well.

There are indeed only three PD nodes, and the logs of the PD leader node are in the attachment.

Additionally, please check the resources during the backup phase, including IO and network, to see if they are maxed out. If not, it might be a locking issue.

There are no resource bottlenecks; my database is very small.

| username: HACK | Original post link

I can’t retrieve the TiKV logs at the moment. I checked the tikv.log on several nodes on the server, and there were no error messages around that time.

All my clusters exhibit this phenomenon; as soon as I start a dumpling backup, this warning message appears.

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.