I have a rig that is crashing every 3 to 4 hours and I cannot pinpoint the reason why.
The crash affects somehow the ethernet interface (or maybe the TCP/IP stack), and the rig becomes unreachable by ssh and doesn't respond to pings.
It basically needs to be reset manually. This is quite annoying as I live several hours away from where I have the rigs.
Is there a way on how to investigate the crash and how to find a solution?
We are working on adding a reset network manager or reboot if rig cant access router to watchdog.
Hopefully it will be added to v0019-2.0 soon.
v0019-2.0 is almost done, just some last edits to watchdog, then we will announce it.
This is great!! nvOC is becoming by far the best Linux mining distro (better than commercial ones). I am really amazed by your work guys.
To your experience, are these type of crashes HW, SW or it could be anything?
The only reason i had so far for a full freeze was if i OC the first card a little bit to much. Try to lower the OC settings for your GPU0 a little bit.