您的位置 首页 > 腾讯云社区

Linux 下使用 Monit 实现服务挂掉自动拉起---叨叨软件测试

背景

由于应用稳定性或者服务器资源限制等问题,应用就会出现自动挂掉的情况,此时就需要自动拉起应用。

生产环境,为了防止因为意外宕机造成服务长时间中断,一般都会设置服务进程监控拉起机制。

简介

Monit - utility for monitoring services on a Unix system

Monit 是 Unix 系统上的服务监控工具。可以用来监控和管理进程、程序、文件、目录和设备等。

优点

安装配置简单,超轻量可以监控前后台进程(Supervisor 无法监控后台启动进程)除了监控进程还可以监控文件,还可以监控系统资源(CPU,内存,磁盘)使用率可以设置进程依赖,控制启动顺序

缺点

Monit 采用间隔轮询的方式检测,决定了它达不到 Supervisor 一样的实时感知。安装 1# 安装 epel 源 2$ yum -y install epel-release 3 4# 安装 monit 5$ yum -y install monit 6 7# 验证 8$ monit -V 9This is Monit version 5.26.0 10Built with ssl, with ipv6, with compression, with pam and with large files 11Copyright (C) 2001-2019 Tildeslash Ltd. All Rights Reserved. 12 13# 启动服务 14$ systemctl start monit 15 16# 启动 monit 守护进程 17$ monit命令

官方手册:https://mmonit.com/monit/documentation/monit.html

命令格式: monit [options]+ [command]

1# 查看帮助信息 2$ monit -h命令选项常用命令配置

yum 安装后的默认配置文件如下: 全局参数配置文件 :/etc/monitrc 服务监控配置文件目录:/etc/monit.d 日志文件:/var/log/monit.log

1# 配置文件 2$ grep -v "^#" /etc/monitrc 3# 每 5 秒检查被监控服务的状态 4set daemon 5 # check services at 30 seconds intervals 5set log syslog 6 7# 启用内置的 web 服务器 8set httpd port 2812 and 9 use address 10.0.0.2 # only accept connection from localhost (drop if you use M/Monit) 10 # 允许 localhost 连接 11 allow localhost # allow localhost to connect to the server and 12 # 解决本地命令报错问题:Error receiving data -- Connection reset by peer 13 allow 10.0.0.2 14 # 运行外网 IP 访问 15 allow x.x.x.x 16 # web登录的用户名和密码 17 allow admin:monit # require user 'admin' with password 'monit' 18 #with ssl { # enable SSL/TLS and set path to server certificate 19 # pemfile: /etc/ssl/certs/monit.pem 20 #} 21 22# 监控服务配置文件目录 23include /etc/monit.d/*监控服务 1# 查看 nexus 监控文件 2$ cat /etc/monit.d/nexus 3check process nexus 4 matching "org.sonatype.nexus.karaf.NexusMain" 5 start program = "/root/nexus3/nexus-3.12.1-01/bin/nexus start" 6 stop program = "/root/nexus3/nexus-3.12.1-01/bin/nexus stop" 7 if failed port 18081 then restart 8 9# 查看 nexus 监控状态 10$ monit status nexus 11Monit 5.26.0 uptime: 3h 48m 12 13Process 'nexus' 14 status OK 15 monitoring status Monitored 16 monitoring mode active 17 on reboot start 18 pid 15191 19 parent pid 1 20 uid 0 21 effective uid 0 22 gid 0 23 uptime 1m 24 threads 96 25 children 0 26 cpu 0.2% 27 cpu total 0.2% 28 memory 14.3% [1.1 GB] 29 memory total 14.3% [1.1 GB] 30 security attribute - 31 disk read 0 B/s [1.6 MB total] 32 disk write 0 B/s [232.5 MB total] 33 port response time 1.756 ms to localhost:18081 type TCP/IP protocol DEFAULT 34 data collected Wed, 13 May 2020 14:36:27 35 36# 验证 nexus 停机自动拉起 37$ kill -9 15191 38 39# 间隔时间内还未拉起 40$ monit status nexus 41Monit 5.26.0 uptime: 3h 48m 42 43Process 'nexus' 44 status Does not exist 45 monitoring status Monitored 46 monitoring mode active 47 on reboot start 48 data collected Wed, 13 May 2020 14:36:42 49 50# 查看自动拉起后的 nexus 监控状态 51$ monit status nexus 52Monit 5.26.0 uptime: 3h 48m 53 54Process 'nexus' 55 status OK 56 monitoring status Monitored 57 monitoring mode active 58 on reboot start 59 pid 15830 60 parent pid 1 61 uid 0 62 effective uid 0 63 gid 0 64 uptime 0m 65 threads 52 66 children 0 67 cpu 64.0% 68 cpu total 64.0% 69 memory 4.5% [349.2 MB] 70 memory total 4.5% [349.2 MB] 71 security attribute - 72 disk read 0 B/s [84 kB total] 73 disk write 0 B/s [36.9 MB total] 74 port response time - 75 data collected Wed, 13 May 2020 14:36:45 76 77# 查看过程日志 78$ tailf -20 /var/log/monit.log 79...... 80[CST May 13 14:35:09] error : 'nexus' process is not running 81[CST May 13 14:35:09] info : 'nexus' trying to restart 82[CST May 13 14:35:09] info : 'nexus' start: '/root/nexus3/nexus-3.12.1-01/bin/nexus start' 83[CST May 13 14:35:17] info : Reinitializing monit daemon 84[CST May 13 14:35:17] info : Reinitializing Monit -- control file '/etc/monitrc' 85[CST May 13 14:35:17] info : 'VM_0_2_centos' Monit reloaded 86[CST May 13 14:36:42] error : 'nexus' process is not running 87[CST May 13 14:36:42] info : 'nexus' trying to restart 88[CST May 13 14:36:42] info : 'nexus' start: '/root/nexus3/nexus-3.12.1-01/bin/nexus start' 89[CST May 13 14:36:45] info : 'nexus' process is running with pid 15830web 控制台

web 控制台地址:http://10.0.0.2:2812/

主页面:

监控运行信息:

系统监控信息:

进程监控信息:

---来自腾讯云社区的---叨叨软件测试

关于作者: 瞎采新闻

这里可以显示个人介绍!这里可以显示个人介绍!

热门文章

留言与评论(共有 0 条评论)
   
验证码: