Post
Topic
Board Mining (Altcoins)
Re: [OS] nvOC easy-to-use Linux Nvidia Mining vBASIC || Community Edition 2.0
by
papampi
on 01/05/2018, 06:11:45 UTC
Error on rig 2 - Two different rigs, crashing within a minute of each other. Tell me that isn't weird.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
| 12  |    120W     |  3.42 Sol/W  |
+-----+-------------+--------------+
INFO 09:34:50: GPU3 Accepted share 186ms [A:454, R:1]
INFO 09:34:51: GPU7 Accepted share 187ms [A:477, R:1]
CRITICAL: Sun Apr 29 09:35:17 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:35:29 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0F:00.0: GPU is lost.  Reboot the system to recover this GPU


Mon Apr 30 22:35:30 MST 2018 - reboot in 10 seconds


If both rigs crash and freeze at the same time, it can be electrical problem
I had almost same issue a while back and some of my rigs were crashing all at the same time,
found out when one of the room venting fans was turning on it was making a high frequency noise in electricity and 3-4 rigs gets the lost gpu at the same time and reboot.
After a month of pulling my hairs to find the problem I changed that fan and problem solved.


Open 5watchdog
Change:
Code:
       echo "$(date) - Lost GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}
To:
Code:
       echo "$(date) - Lost GPU $GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}

So you can check GPU number that is lost, then check if it is the same GPU always get lost?
If its always the same GPU, remove it from the rig and check, may be a faulty GPU, riser or power cable.
If after removing the GPU, the problem jumps to another GPU then it could be a power problem.