After upgrading TiDB to 6.5.1, TiKV restart anomalies increase

translator_bot · June 22, 2024, 11:29am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 升级至6.5.1后，TiKV重启异常增加

| username: TiDBer_CQ

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] v6.5.1
[Reproduction Path] Normal read and write operations
[Encountered Issue: Symptoms and Impact] TiKV abnormal restart, no errors found in the logs, but the following errors are visible at the system level:

[Resource Configuration] 16C 64G
[Attachments: Screenshots/Logs/Monitoring]

Apr 11 11:16:02 master2 systemd[1]: Started Session c41863 of user root.
Apr 11 11:16:08 master2 kernel: [3432600.511534] EXT4-fs error (device vdb) in ext4_free_blocks:4972: Corrupt filesystem
Apr 11 11:16:08 master2 kernel: [3432600.523842] vmsec-host-net. (3664293): drop_caches: 3
Apr 11 11:16:08 master2 kernel: [3432600.597301] [VMSECNFQ]: Version: 8.0.3
Apr 11 11:16:08 master2 kernel: [3432600.597307] [VMSECNFQ]: Hook nic prefix enp3s0: dev: 000000009c7a1abe, IP: 0.0.0.0
Apr 11 11:16:08 master2 kernel: [3432600.603000] [VMSECNFQ]: init success
Apr 11 11:16:08 master2 kernel: [3432600.617574] EXT4-fs error (device vdb) in ext4_free_blocks:4972: Corrupt filesystem
Apr 11 11:16:09 master2 kernel: [3432601.748412] EXT4-fs error (device vdb) in ext4_free_blocks:4972: Corrupt filesystem
Apr 11 11:16:09 master2 kernel: [3432601.750499] EXT4-fs error (device vdb) in ext4_free_blocks:4972: Corrupt filesystem
Apr 11 11:16:09 master2 kernel: [3432601.773392] [VMSECNFQ]: Safe to release netlink socket as there are no live reference!
Apr 11 11:16:09 master2 kernel: [3432601.773435] [VMSECNFQ]: Safe to quit as there are no ongoing operations!
Apr 11 11:16:10 master2 kernel: [3432601.887681] [VMSECNFQ]: g_vmsec_fw_ctx resource free OK
Apr 11 11:16:14 master2 systemd[1]: session-c41863.scope: Succeeded.
Apr 11 11:16:18 master2 kernel: [3432610.065748] EXT4-fs error (device vdb) in ext4_free_blocks:4972: Corrupt filesystem
Apr 11 11:16:18 master2 systemd[1]: tikv-20160.service: Main process exited, code=exited, status=1/FAILURE
Apr 11 11:16:18 master2 systemd[1]: tikv-20160.service: Failed with result 'exit-code'.
Apr 11 11:16:33 master2 systemd[1]: tikv-20160.service: Service RestartSec=15s expired, scheduling restart.
Apr 11 11:16:33 master2 systemd[1]: tikv-20160.service: Scheduled restart job, restart counter is at 43.
Apr 11 11:16:33 master2 systemd[1]: Stopped tikv service.
Apr 11 11:16:33 master2 systemd[1]: Started tikv service.

Has anyone encountered the same issue? Does anyone know what might be causing this?

translator_bot · June 22, 2024, 11:29am

| username: 裤衩儿飞上天 | Original post link

Is the hard drive malfunctioning?

translator_bot · June 22, 2024, 11:29am

| username: xingzhenxiang | Original post link

Check the hard drive, but it looks like a virtual disk.

translator_bot · June 22, 2024, 11:29am

| username: tidb菜鸟一只 | Original post link

Is the host memory not reaching its limit?

translator_bot · July 26, 2024, 10:40am

| username: TiDBer_C33 | Original post link

Have you found the reason for this?

translator_bot · August 27, 2024, 2:25am

| username: Hacker_zuGnSsfP | Original post link

It seems like there is a hardware issue, based on this log.