Post
Topic
Board Mining (Altcoins)
Re: [OS] nvOC easy-to-use Linux Nvidia Mining v0019
by
leenoox
on 25/09/2017, 18:43:57 UTC
Hi guys,

I have recently built a new rig with 13 P106-100 ASUS cards on nv0019 running -200cc 1550mc and PL90. I have a weird problem that after some time the hashrate on all cards drop by 50% or more and doesn't go back up, the miner doesn't restart and it just keeps running with lower hashrate.  This time it did this after 7 hours , with 7 cards running it was up for over 12hours and no problems. Has anyone experienced this before and maybe you know where the problem is?

I am obviously running headless and using SSH to monitor the rig and adjust the settings.

PS. The miner or the OS doesn't restart the miner or reboot the system, just the hashrate drops pretty badly from 328 to 140 MH/s.

Thanks in advance to everyone who can help and also thanks to fullzero for the new version!


I've had similar issue, when one or two cards would semi-freeze, bringing the whole rig to work extremely slow. I added whole bunch of debug options in the watchdog to figure out what was going on... it turns out watcdog enters into a loop for counting down errors before trying to restart the mining process and it takes about 5-10 min to change the count by -1 and it won't catch the low utilization to restart the rig until few hours later.

The mining doesn't stop but continues at reduced hash speed, some cards drop from 25 to 13 MH/s, some down to 0.

To make it worst, during this semi-hang the driver changes p-state due to low/high utilization so it ends up with some cards at p2, some at p8 and some at p0 which screws the OC and makes more cards to hang.

I have some ideas to modify or even rewrite the watchdog to address these issues and will post it once finished. I am quite busy atm to work on it but I can post logs from the added debug so that devs can look into it.

Anyway, it turns out all of this was caused by two cards (same brand/model/memory) that couldn't handle the overclocking as the rest of the cards. By reducing the memory overclock on these two from 1820 to 1700 it stabilized the rig, no more hangups.

For now, reduce your overclock and give it a try, see if it fixes the problem. Please post back with results.

Have you managed to improve your watchdog as you said in this previous post?

Hey CryptAtomeTrader44,
I had some code that I was testing, however after updating to 19-1, the update has overwritten the watchdog file with all the changes I had in it and I had no backup of the modified wd Sad
I have to start over now, hopefully I will have working beta version by the end of the week. I will post once I am done.