Hey all, got what amounts to a head's up for folks. (Not sure there's a solution...)
I have a rig running Win10 with two nVidia GPUs, one being a 3060 so I have the drivers v470.05 installed to unlock full hashrate for that GPU which is also installed in a 16-bit slot and cabled up to HDMI. The other nVidia card is a 3060 Ti (non-LHR) and this system also has a pair of AMD RX580-8GB in it. PM is in charge of all hardware control for all four cards.
Things were completely stable until yesterday and the rig was still running PM v5.7b (just due to laziness on my part - no incentive to update PM's bits, as nothing was having an issue.)
As I said, all was well - the rig was up over 600 hours non-stop until I un-paused updates and allowed the monthly cumulative updates for Windows to be installed yesterday. (I do this manually, every month.)
Now, for these two nVidia cards on that driver version, PM has lost its ability to set clock speeds for GPU and Memory. Instead, I get messages with error - 137, like this:
GPU1: Unable to reset memory clock delta - error -137
GPU3: Unable to reset memory clock delta - error -137
GPU1: Unable to set GPU clock delta to -400 MHz - error -137
GPU1: Unable to set memory clock delta to 1200 MHz - error -137
GPU3: Unable to set GPU clock delta to -200 MHz - error -137
GPU3: Unable to set memory clock delta to 1370 MHz - error -137
I can only assume that something in this month's Win10 patches is messing with stuff and it could be/probably is specific to this particular nVidia driver. One thing I did note was that for the first 20 minutes after the reboot, after patching, everything was still normal as far as mining was concerned. But 22 minutes into it, I saw this appear in the log:
2021.12.17:19:03:55.165: gps3 CUDA error in CudaProgram.cu:256 : unknown error (999)
2021.12.17:19:03:55.165: GPU3 CUDA error in CudaProgram.cu:474 : unknown error (999)
2021.12.17:19:03:55.165: GPU3 GPU3 search error: unknown error
2021.12.17:19:03:55.165: gps3 CUDA error in CudaProgram.cu:256 : unknown error (999)
2021.12.17:19:03:55.165: wdog Fatal error detected. Restarting.
2021.12.17:19:03:55.165: eths Eth: New job #7a9544e3 from ssl://us1.ethermine.org:5555; diff: 4295MH
2021.12.17:19:03:55.219: GPU1 CUDA error in CudaProgram.cu:474 : unknown error (999)
2021.12.17:19:03:55.219: GPU1 GPU1 search error: unknown error
(GPU3 is the 3060, GPU1 is the 3060Ti)
FWIW, there is no difference if I run PM v5.9d with my same config file, I get exactly the same error when PM is starting up as above, the 'unable to set/reset clocks' errors. My next action will be to look in recently-installed updates to see if there is something like a driver that was downloaded after the reboot which I might be able to uninstall.
What's truly insidious about this is the fact that if I test by using PM in benchmark mode and I employ MSI Afterburner to set core clocks/frequencies/undervolts (instead of PM) - that all still works just fine. And Afterburner is not a recent version, it's one that's a few versions behind because I prefer the one with the older UI - which shows that evidently it is still possible (for apps other than PM) to set clock deltas, post this-month's Win10 updates, and with the same version of the nVidia driver, and without said app requiring some sort of update.
OK, there you have it. Sorry about the long post. Just sharing with the class. I've been using PM for a long time and love how stable my particular (four) rigs are - or, were - up till now. BTW, none of the other rigs I have require that funky driver, in fact they're using older nVidia drivers than 470.05, and none of them are any worse for the wear after yesterday's patching. Just this funky one. Cheers.