TiDB 4000 Service Keeps Restarting

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 4000服务不断重启

| username: TiDB_New_People

One of the three TiDB servers keeps restarting. Brothers and sisters, please help take a look.
The logs are as follows:
Aug 10 21:12:37 TIDB-PD1 bash: [2022/08/10 21:12:37.001 +08:00] [WARN] [config.go:1004] [“Some configuration options should be moved to [instance] section. Please use the latter config options in [instance] instead: (slow-threshold, tidb_slow_log_threshold).”]
Aug 10 21:18:31 TIDB-PD1 kernel: audit: audit_lost=5113587 audit_rate_limit=512 audit_backlog_limit=16384
Aug 10 21:18:31 TIDB-PD1 kernel: audit: rate limit exceeded
Aug 10 21:21:51 TIDB-PD1 systemd: tidb-4000.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 10 21:21:51 TIDB-PD1 systemd: Unit tidb-4000.service entered failed state.
Aug 10 21:21:51 TIDB-PD1 systemd: tidb-4000.service failed.
Aug 10 21:22:06 TIDB-PD1 systemd: tidb-4000.service holdoff time over, scheduling restart.
Aug 10 21:22:06 TIDB-PD1 systemd: Stopped tidb service.
Aug 10 21:22:06 TIDB-PD1 systemd: Started tidb service.
Aug 10 21:22:06 TIDB-PD1 bash: [2022/08/10 21:22:06.749 +08:00] [WARN] [config.go:1004] [“Some configuration options should be moved to [instance] section. Please use the latter config options in [instance] instead: (slow-threshold, tidb_slow_log_threshold).”]
Aug 10 21:24:00 TIDB-PD1 systemd-logind: New session 5981 of user root.
Aug 10 21:24:00 TIDB-PD1 systemd: Started Session 5981 of user root.
Aug 10 21:24:01 TIDB-PD1 systemd-logind: New session 5982 of user root.
Aug 10 21:24:01 TIDB-PD1 systemd: Started Session 5982 of user root.
Aug 10 21:26:54 TIDB-PD1 systemd: tidb-4000.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 10 21:26:54 TIDB-PD1 systemd: Unit tidb-4000.service entered failed state.
Aug 10 21:26:54 TIDB-PD1 systemd: tidb-4000.service failed.
Aug 10 21:27:09 TIDB-PD1 systemd: tidb-4000.service holdoff time over, scheduling restart.
Aug 10 21:27:09 TIDB-PD1 systemd: Stopped tidb service.
Aug 10 21:27:09 TIDB-PD1 systemd: Started tidb service.
Aug 10 21:27:09 TIDB-PD1 bash: [2022/08/10 21:27:09.751 +08:00] [WARN] [config.go:1004] [“Some configuration options should be moved to [instance] section. Please use the latter config options in [instance] instead: (slow-threshold, tidb_slow_log_threshold).”]

| username: xfworld | Original post link

  1. Are the connections to the three TiDB nodes balanced?

  2. Is the resource usage of the three TiDB nodes consistent, or is there a significant difference?

  3. Check the top 10 slow SQL queries through the Dashboard to investigate whether any SQL queries require a large amount of memory, which could lead to OOM (Out of Memory) issues.

  4. Enable diagnostic capabilities by setting the execution time and memory usage limits for SQL queries to capture SQL-related problems.

  5. You can try enabling a global maximum execution time for each SQL query. If a query exceeds this time, it will be killed to mitigate TiDB OOM issues.

Additionally, please provide specific cluster configuration and version information to help with the assessment.

| username: TiDB_New_People | Original post link

global:
user: tidb
ssh_port: 22
ssh_type: builtin
deploy_dir: /data/tidb-deploy
data_dir: /data/tidb-data
os: linux
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
deploy_dir: /data/tidb-deploy/monitor-9100
data_dir: /data/tidb-data/monitor-9100
log_dir: /data/tidb-deploy/monitor-9100/log
server_configs:
tidb:
log.slow-threshold: 300
mem-quota-query: 4294967296
tidb_servers:

  • host: 192.168.16.196
    ssh_port: 22
    port: 4000
    status_port: 10080
    deploy_dir: /data/tidb-deploy/tidb-4000
    log_dir: /data/tidb-deploy/tidb-4000/log
    arch: amd64
    os: linux
| username: TiDB_New_People | Original post link

Teacher, please take a look.

| username: TiDB_New_People | Original post link

Three TiDB server instances, with the 196 configuration item having duplicate data, all three are the same? Is this a configuration issue?

| username: cheng | Original post link

Have you adjusted the configuration options it mentioned?

| username: TiDB_New_People | Original post link

I haven’t made any adjustments.

| username: jansu-dev | Original post link

  1. What version is the database?
  2. Please send the tidb.toml file located in the tidb deploy/config directory.
| username: TiDB_New_People | Original post link

Teacher, please take a look.

WARNING: This file is auto-generated. Do not edit! All your modifications will be overwritten!

You can use ‘tiup cluster edit-config’ and ‘tiup cluster reload’ to update the configuration

All configuration items you want to change can be added to:

server_configs:

tidb:

aa.b1.c3: value

aa.b2.c4: value

mem-quota-query = 4294967296
new_collations_enabled_on_first_bootstrap = true

[log]
slow-threshold = 300

| username: TiDB_New_People | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: Billmay表妹 | Original post link

Which version?

| username: TiDB_New_People | Original post link

TiDB 6.1

| username: jansu-dev | Original post link

Oh, then the restart has nothing to do with this configuration. I tested it, and although this parameter is not configured correctly, it should not be related to the TiDB restart issue. It is recommended to follow the points provided by xfworld for troubleshooting first. If the issue is not resolved, you can provide the tidb.log for 10 minutes before and after the restart time.

| username: TiDB_New_People | Original post link

Teacher, please take a look at the log.

| username: jansu-dev | Original post link

  1. It might be caused by using prepare. Is this in a production environment? → from Prepare 语句执行计划缓存 | PingCAP 文档中心
  2. You need to re-execute the previously prepared SQL statements.
| username: TiDB_New_People | Original post link

Teacher, this is a production environment. Let me first check your reply.

| username: TiDB_New_People | Original post link

The system log is as follows:
Aug 16 17:48:15 TIDB-PD1 systemd: Created slice User Slice of tidb.
Aug 16 17:48:15 TIDB-PD1 systemd-logind: New session 6512 of user tidb.
Aug 16 17:48:15 TIDB-PD1 systemd: Started Session 6512 of user tidb.
Aug 16 17:48:15 TIDB-PD1 systemd-logind: Removed session 6512.
Aug 16 17:48:15 TIDB-PD1 systemd: Removed slice User Slice of tidb.
Aug 16 17:48:15 TIDB-PD1 systemd: Created slice User Slice of tidb.
Aug 16 17:48:15 TIDB-PD1 systemd-logind: New session 6513 of user tidb.
Aug 16 17:48:15 TIDB-PD1 systemd: Started Session 6513 of user tidb.
Aug 16 17:48:15 TIDB-PD1 systemd: Reloading.
Aug 16 17:48:15 TIDB-PD1 systemd: Started blackbox_exporter service.
Aug 16 17:48:15 TIDB-PD1 systemd-logind: Removed session 6513.
Aug 16 17:48:15 TIDB-PD1 systemd: Removed slice User Slice of tidb.
Aug 16 17:48:15 TIDB-PD1 bash: level=info ts=2022-08-16T09:48:15.669186844Z caller=main.go:213 msg=“Starting blackbox_exporter” version=“(version=0.12.0, branch=HEAD, revision=4a22506cf0cf139d9b2f9cde099f0012d9fcabde)”
Aug 16 17:48:15 TIDB-PD1 bash: level=info ts=2022-08-16T09:48:15.669802622Z caller=main.go:220 msg=“Loaded config file”
Aug 16 17:48:15 TIDB-PD1 bash: level=info ts=2022-08-16T09:48:15.669927186Z caller=main.go:324 msg=“Listening on address” address=:9115
Aug 16 17:48:15 TIDB-PD1 systemd: Created slice User Slice of tidb.
Aug 16 17:48:15 TIDB-PD1 systemd-logind: New session 6514 of user tidb.
Aug 16 17:48:15 TIDB-PD1 systemd: Started Session 6514 of user tidb.
Aug 16 17:48:15 TIDB-PD1 systemd-logind: Removed session 6514.
Aug 16 17:48:15 TIDB-PD1 systemd: Removed slice User Slice of tidb.
Aug 16 17:51:25 TIDB-PD1 systemd: tidb-4000.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 16 17:51:25 TIDB-PD1 systemd: Unit tidb-4000.service entered failed state.
Aug 16 17:51:25 TIDB-PD1 systemd: tidb-4000.service failed.
Aug 16 17:51:40 TIDB-PD1 systemd: tidb-4000.service holdoff time over, scheduling restart.
Aug 16 17:51:40 TIDB-PD1 systemd: Stopped tidb service.
Aug 16 17:51:40 TIDB-PD1 systemd: Started tidb service.
Aug 16 17:51:40 TIDB-PD1 bash: [2022/08/16 17:51:40.508 +08:00] [WARN] [config.go:1004] [“Some configuration options should be moved to [instance] section. Please use the latter config options in [instance] instead: (slow-threshold, tidb_slow_log_threshold).”]
Aug 16 17:51:46 TIDB-PD1 systemd: tidb-4000.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 16 17:51:46 TIDB-PD1 systemd: Unit tidb-4000.service entered failed state.
Aug 16 17:51:46 TIDB-PD1 systemd: tidb-4000.service failed.
Aug 16 17:52:01 TIDB-PD1 systemd: tidb-4000.service holdoff time over, scheduling restart.
Aug 16 17:52:01 TIDB-PD1 systemd: Stopped tidb service.
Aug 16 17:52:01 TIDB-PD1 systemd: Started tidb service.
Aug 16 17:52:01 TIDB-PD1 bash: [2022/08/16 17:52:01.256 +08:00] [WARN] [config.go:1004] [“Some configuration options should be moved to [instance] section. Please use the latter config options in [instance] instead: (slow-threshold, tidb_slow_log_threshold).”]
Aug 16 18:00:01 TIDB-PD1 systemd: Started Session 6515 of user root.

| username: xfworld | Original post link

What did this TiDB node execute during this period?

| username: jansu-dev | Original post link

Has it not been resolved yet? After restarting the application, is the error in tidb.log still the same?
If it is still the same, enable debug log for TiDB and post another tidb.log.

  1. Modify TiDB debug log level
tiup cluster edit-config szp-test
Modify the parameters
tiup cluster reload szp-test -R tidb

image
2. Collect tidb.log and post it

By the way, also get the tidb_stderr.log.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.