Hello again fullzero,
Haven't checked the thread since page 84 so i'm sure i have plenty to catch up to, i see you and some of the posters went ahead and made a monitoring script, kinda sorry i've been out of the loop since i too made a monitoring script because of my recurring soft/hard crashes.
i'm not sure if your script covers the same thing mine does, or if they can be merged so i'll let you be the judge, basically what it does is monitor the system load average and if it goes over a specified amount (in my case over 2.0 means i had a soft crash and soon the load average will climb until not even SSH works) it reboots the machine and sends me an email after the rig is back up, the script also checks the external IP of the rig and sends an email if the IP changed (i have a dynamic IP).
#!/bin/bash
#this script will check avg load and reboot & email you when needed.
#this script will check when the external IP changes and email you.
systemavg=$(uptime | awk -F'load average:' '{print $2}'| cut -d',' -f1)
myip="$(dig +short myip.opendns.com @resolver1.opendns.com)"
oldip="$(cat /root/jobs/myIP.txt)"
emailaddress=$"(YOUR_EMAIL)"
shouldsendmail="$(cat /root/jobs/shouldsendmail.txt)"
if [ -n "$myip" ]
then
echo "$(date) Current system load in the last 5 min is : $systemavg - My public IP address: ${myip}" >> /root/jobs/log.txt
if [ "$myip" != "$oldip" ]
then
echo "${myip}" > /root/jobs/myIP.txt
echo "The new IP address is: $myip" | mail -s "System IP Change!" $emailaddress
fi
if [ "$shouldsendmail" == "YES" ]
then
echo "System was Rebooted due to exessive load - $(date)" | mail -s "System
rebooted!" $emailaddress
echo "NO" > /root/jobs/shouldsendmail.txt
fi
else
echo "$(date) Current system load in the last 5 min is : $systemavg - My public IP address: ${oldip} - no connection!" >> /root/jobs/log.txt
fi
F1=$(echo $systemavg*100 | bc)
Flag=$(printf "%.0f\n" $F1)
if [ $Flag -gt 200 ]
then
sudo service lightdm stop
echo "YES" > /root/jobs/shouldsendmail.txt
echo "Rebooting system because of load - $(date)" >> /root/jobs/log.txt
sleep 1
sudo systemctl reboot
fi
this script is initiated from crontab every 1 minute, needs to be located in root/jobs.
please tell me if this is redundant considering the additions to v18.