Single Server Multi-Component Deployment: Sudden Memory Overload Leading to Continuous Reboots

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 单服务器多组件部署,突然内存打满,导致一直重启,内存打满,又重启

| username: SummerGu

What are the causes of these anomalies? How can this problem be resolved?

| username: tidb菜鸟一只 | Original post link

Use tiup cluster display tidb-xxx to check the current status of the cluster components.

| username: SummerGu | Original post link

Three TiKV nodes are repeatedly restarting…

| username: SummerGu | Original post link

@tidb newbie

| username: 有猫万事足 | Original post link

It is estimated that the OOM killer has been continuously killing.

The parameters for hybrid deployment need to be adjusted. The parameters mentioned in the documentation need to be recalculated and adjusted.

| username: zhanggame1 | Original post link

Use the top command to check memory usage, press Shift+M to sort by memory, and observe which components are using a lot of memory.

| username: SummerGu | Original post link

@CatLover @tidbNewbie


It keeps printing these errors. I configured it according to the configuration file:

global:
      2   user: tidb
      3   ssh_port: 22
      4   ssh_type: builtin
      5   deploy_dir: /opt/soft/tidb/tidb-deploy
      6   data_dir: /opt/soft/tidb/tidb-data
      7   os: linux
      8 monitored:
      9   node_exporter_port: 9100
     10   blackbox_exporter_port: 9115
     11   deploy_dir: /opt/soft/tidb/tidb-deploy/monitor-9100
     12   data_dir: /opt/soft/tidb/tidb-data/monitor-9100
     13   log_dir: /opt/soft/tidb/tidb-deploy/monitor-9100/log
     14 server_configs:
     15   tidb:
     16     log.file.max-backups: 7
     17     log.level: error
     18     log.slow-threshold: 300
     19   tikv:
     20     log.file.max-backups: 7
     21     log.file.max-days: 7
     22     log.level: error
     23     raftstore.capacity: 10G
     24     readpool.coprocessor.use-unified-pool: true
     25     readpool.storage.use-unified-pool: true
     26     storage.block-cache.capacity: 5G
     27     storage.block-cache.shared: true
     28   pd:
     29     log.file.max-backups: 7
     30     log.file.max-days: 1
     31     log.level: error
     32     replication.enable-placement-rules: true
     33     replication.location-labels:
     34     - host
     35   tidb_dashboard: {}
     36   tiflash:
     37     logger.level: info
     38   tiflash-learner: {}
     39   pump: {}
     40   drainer: {}
     41   cdc: {}
     42   kvcdc: {}
     43   grafana: {}
     44 tidb_servers:
     45 - host: 192.168.103.30
     46   ssh_port: 22
     47   port: 4000
     48   status_port: 10080
     49   deploy_dir: /opt/soft/tidb/tidb-deploy/tidb-4000
     50   log_dir: /opt/soft/tidb/tidb-deploy/tidb-4000/log
     51   arch: amd64
     52   os: linux
     53 tikv_servers:
     54 - host: 192.168.103.30
     55   ssh_port: 22
     56   port: 20160
     57   status_port: 20180
     58   deploy_dir: /opt/soft/tidb/tidb-deploy/tikv-20160
     59   data_dir: /opt/soft/tidb/tidb-data/tikv-20160
log_dir: /opt/soft/tidb/tidb-deploy/tikv-20160/log
     61   config:
     62     server.labels:
     63       host: logic-host-1
     64   arch: amd64
     65   os: linux
     66 - host: 192.168.103.30
     67   ssh_port: 22
     68   port: 20161
     69   status_port: 20181
     70   deploy_dir: /opt/soft/tidb/tidb-deploy/tikv-20161
     71   data_dir: /opt/soft/tidb/tidb-data/tikv-20161
     72   log_dir: /opt/soft/tidb/tidb-deploy/tikv-20161/log
     73   config:
     74     server.labels:
     75       host: logic-host-1
     76   arch: amd64
     77   os: linux
     78 - host: 192.168.103.30
     79   ssh_port: 22
     80   port: 20162
     81   status_port: 20182
     82   deploy_dir: /opt/soft/tidb/tidb-deploy/tikv-20162
     83   data_dir: /opt/soft/tidb/tidb-data/tikv-20162
     84   log_dir: /opt/soft/tidb/tidb-deploy/tikv-20162/log
     85   config:
     86     server.labels:
     87       host: logic-host-1
     88   arch: amd64
     89   os: linux
     90 tiflash_servers: []
     91 pd_servers:
     92 - host: 192.168.103.30
     93   ssh_port: 22
     94   name: pd-192.168.103.30-2379
     95   client_port: 2379
     96   peer_port: 2380
     97   deploy_dir: /opt/soft/tidb/tidb-deploy/pd-2379
     98   data_dir: /opt/soft/tidb/tidb-data/pd-2379
     99   log_dir: /opt/soft/tidb/tidb-deploy/pd-2379/log
    100   arch: amd64
    101   os: linux
    102 monitoring_servers:
    103 - host: 192.168.103.30
    104   ssh_port: 22
    105   port: 9090
    106   ng_port: 12020
    107   deploy_dir: /opt/soft/tidb/tidb-deploy/prometheus-9090
    108   data_dir: /opt/soft/tidb/tidb-data/prometheus-9090
    109   log_dir: /opt/soft/tidb/tidb-deploy/prometheus-9090/log
    110   external_alertmanagers: []
    111   arch: amd64
    112   os: linux
    113 grafana_servers:
	114 - host: 192.168.103.30
    115   ssh_port: 22
    116   port: 3000
    117   deploy_dir: /opt/soft/tidb/tidb-deploy/grafana-3000
    118   arch: amd64
    119   os: linux
    120   username: admin
    121   password: admin
    122   anonymous_enable: false
    123   root_url: ""
    124   domain: ""
| username: SummerGu | Original post link

It is always one of the three TiKVs.

| username: SummerGu | Original post link

@zhanggame1

| username: 啦啦啦啦啦 | Original post link

Is the machine’s memory 32GB? The storage.block-cache.capacity of 5GB might be a bit too large. Try reducing it a bit.

| username: SummerGu | Original post link

It seems that due to the repeated startup of the kv nodes, the configuration cannot be reloaded after modification.

| username: SummerGu | Original post link

A lot of inexplicable errors

| username: SummerGu | Original post link

Is it because there are too many regions?

| username: tidb菜鸟一只 | Original post link

First, set tikv config storage.block-cache.capacity=3G to prevent it from OOM. If you set it to 5G, each TiKV can use up to 12G of total memory. With 32G of memory, it won’t be enough and will keep killing TiKV.

| username: Kongdom | Original post link

Single-machine multi-node hybrid deployment can refer to this document to set resource limits.

| username: xfworld | Original post link

Unstoppable~ :upside_down_face: :upside_down_face: :upside_down_face: :upside_down_face:

| username: redgame | Original post link

Insufficient resources, please provide more.

| username: Kongdom | Original post link

Huh? Even after setting it according to the documentation, it still can’t be restricted? Is it only the new version that can achieve perfect resource isolation?

| username: xfworld | Original post link

The new version has soft control and introduces the concept of a resource pool. The usage of all resources will not exceed the upper limit of the resource pool, which will be more effective. However, the difficulty of control and ease of use are still not sufficient, so we can only wait.