Post
Topic
Board Mining (Altcoins)
Re: [OS] nvOC easy-to-use Linux Nvidia Mining v0018
by
Maxximus007
on 24/07/2017, 09:08:35 UTC
Hi All,

A GPU was lost in a rig this weekend, and unfortunately the watchdog did not reboot the rig. After inspection, nvidia-smi is not just reporting the new number of GPU's, but gives a warning. Watchdog was not ready for this message, and just errored.

Therefore a new code block for Watchdog:
Code:
 numtest='^[0-9]+$'
  
  for UTIL in $UTILIZATIONS
  do
    if ! [[ $UTIL =~ $numtest ]]
    then
        # Not numeric so: Help we've lost a GPU, so reboot
        echo "$(date) - Lost GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}
        echo "" | tee -a ${LOG_FILE}
        #Hope PCI BUS info will help find the faulty GPU
        nvidia-smi --query-gpu=gpu_bus_id --format=csv | tee -a ${LOG_FILE}
        echo "reboot in 10 seconds"
        echo ""
        sleep 10
        sudo reboot
    fi

    # If utilization is lower than threshold count them:
    if [ $UTIL -lt $THRESHOLD ]
    then
      echo "$(date) - GPU under threshold found"
      echo ""
      let COUNT=COUNT-1
    fi
    let GPU=GPU+1
  done

just replace the old "for UTIL in $UTILIZATIONS.." block with the new one.