So I'm seeing an odd behavior with 144.5 mining (zelcash in this case, 2 Vega64's, 3 Vega56's). If/when I lose connection to the pool, the moment the worker is reauthorized, my linux box crashes and reboots:
-----------------
Average speed (5s): 34.0 sol/s | 32.6 sol/s | 33.4 sol/s | 30.8 sol/s | 27.6 sol/s Total: 158.3 sol/s
Submitting share
Share accepted
Submitting share
Submitting share
Share accepted
Share accepted
Average speed (5s): 34.2 sol/s | 32.0 sol/s | 30.6 sol/s | 29.0 sol/s | 28.2 sol/s Total: 154.0 sol/s
Submitting share
Share accepted
Lost connection to stratum server equigems.online:9000 or server not reachable.
Trying to connect in 5 seconds
Lost connection to stratum server equigems.online:9000 or server not reachable.
Trying to connect in 5 seconds
Connected to equigems.online:9000
Subscribed to stratum server
New target received: 00a0000000000000
New job received: cccd
Authorized worker:
-----------------
and then the immediate reboot. I don't see the kernel panic in the log, but I see ubuntu 18 uses syslog caching by default. I've removed caching and we'll see if it catches something on the next crash.
This is more of an FYI unless you see something in your reconnect code that might cause a panic. However, I'm not even sure whether the lost connection is a symptom of an imminent crash or the actual cause. Who knows, it could simply be instability with one of my risers...
I'll see if I can track down more info in the logs next time.
edit: I also notice that if I ctrl-c out of the miner, it's a 50/50 toss up that restating the miner causes a reboot on reconnect. So this is likely related. Perhaps something about the state that the card is left in during an abnormal stop? Reuse that card and cause the panic? I find that I usually have to power off the clear the cards before the system is stable again. Just speculating...
I'm undervolting and using the latest amd 18.30 drivers for ubuntu.