Post
Topic
Board Mining (Altcoins)
Re: [OS] nvOC easy-to-use Linux Nvidia Mining v0018
by
fullzero
on 17/07/2017, 03:01:11 UTC
Hello again fullzero,

Haven't checked the thread since page 84 so i'm sure i have plenty to catch up to, i see you and some of the posters went ahead and made a monitoring script, kinda sorry i've been out of the loop since i too made a monitoring script because of my recurring soft/hard crashes.


i'm not sure if your script covers the same thing mine does, or if they can be merged so i'll let you be the judge, basically what it does is monitor the system load average and if it goes over a specified amount (in my case over 2.0 means i had a soft crash and soon the load average will climb until not even SSH works) it reboots the machine and sends me an email after the rig is back up, the script also checks the external IP of the rig and sends an email if the IP changed (i have a dynamic IP).

Code:
#!/bin/bash
#this script will check avg load and reboot & email you when needed.
#this script will check when the external IP changes and email you.

systemavg=$(uptime | awk -F'load average:' '{print $2}'| cut -d',' -f1)
myip="$(dig +short myip.opendns.com @resolver1.opendns.com)"
oldip="$(cat /root/jobs/myIP.txt)"
emailaddress=$"(YOUR_EMAIL)"
shouldsendmail="$(cat /root/jobs/shouldsendmail.txt)"
if [ -n "$myip" ]
then    
   echo "$(date)  Current system load in the last 5 min is : $systemavg - My public IP address: ${myip}" >> /root/jobs/log.txt
   if [ "$myip" != "$oldip" ]
    then
      echo "${myip}" > /root/jobs/myIP.txt
      echo "The new IP address is: $myip" | mail -s "System IP Change!" $emailaddress
    fi

   if [ "$shouldsendmail" == "YES" ]
    then
       echo "System was Rebooted due to exessive load - $(date)" | mail -s "System
rebooted!" $emailaddress
       echo "NO" > /root/jobs/shouldsendmail.txt
    fi
else
   echo "$(date)  Current system load in the last 5 min is : $systemavg - My public IP address: ${oldip} - no connection!" >> /root/jobs/log.txt
fi

F1=$(echo $systemavg*100 | bc)
Flag=$(printf "%.0f\n" $F1)
if [ $Flag -gt 200 ]
then
   sudo service lightdm stop
   echo "YES" > /root/jobs/shouldsendmail.txt
   echo "Rebooting system because of load - $(date)" >> /root/jobs/log.txt
   sleep 1
   sudo systemctl reboot
fi

Thanks for sharing your script.  Smiley

this script is initiated from crontab every 1 minute, needs to be located in root/jobs.

please tell me if this is redundant considering the additions to v18.

Script is by IAmNotAJeep and Maxximus007; I just made a few edits and integrated it into 1bash.

I have been testing several different versions of it; with different values and some small modifications the past few days with different rigs. 

I have found that most problems are resolved by killing and restarting the mining process; and that most of the reboots occurring in the current v0018 are unnecessary, so I have significantly relaxed the values involved with initializing a reboot. 

By doing this most of this issues that eventually lead to the need to reboot are avoided.  I actually haven't had a GPU fall off the bus or a client not be able to reinitialize (although I probably use more conservative OC than most miners).  These are the two situations that a reboot should resolve.  If a hard crash happens; rebooting will not happen.

There is a problem when too high of an OC is used; an endless loop of client restarts then reboots will occur.  Adding logic to detect excessive OC is the best way to deal with this IMO.

Your system load average method could be added to enhance IAmNotAJeep and Maxximus007 watchdog; although I am not sure if client reinitialization mitigates the problem which leads to runaway system load or not.

I do think having an email module which can be called / inserted into scripts / 1bash would be very helpful and is something a lot of members would use.  I know that lost_post has already made something somewhat like this.

https://bitcointalk.org/index.php?topic=1854250.msg20148804#msg20148804

I am not sure how modular lost_post has made this.  A yes/no switch could trigger installation / configuration / use with additional configuration variables defining used data.

Maybe the two of you could work together on this if lost_post hasn't already completed it.

I can include your script as an alternative to the IAmNotAJeep_and_Maxximus007_WATCHDOG if you want; but for now at least: I will default 1bash to using the IAmNotAJeep_and_Maxximus007_WATCHDOG, as it is currently more comprehensive.