Hey all... I'm getting the "NVML: cannot get fan speed, error 999 (an internal driver error occurred)" error... cannot isolate it to one card (all 1060 6GB, but of different makes). Every time I take one suspected card out and put it to a separate instance, the main group still throws this error. I add another to the suspect group and still an error.... and so forth until I can fairly say that no one card is at fault. All but two risers are powered by PCIe, the other two are molex, but on separate cables. Total power usage of the PSU is no more than 80% at any time. I _think_ I'm doing everything right. I've even tried saving the OCs so that when I quit Afterburner, they still hold. (Thought it might be AB's monitoring that was interfering.)
Here's the thing, though... unlike other people, this error does _not_ cause the miner to restart nor does it cause anything like windows to freeze or restart. Plus this error does not show up in the logs straight away, it typically takes 4-6 hours.
Now I know what some might say... "if it isn't crashing, don't worry." But I can't help but think that if this error is being reported, it's for a reason (a hit on number of shares earned, bad connection somewhere, impending doom ;-) ) ... or there could simply be a bug in the program.
Thanks for any hints as to what might be going on.
potificate
P.S. I am using 11.6 and 391.35 and did not have this issue when running 1050Ti's on the same rig.
Error 999 is for nVidia Card and generally are not critical. The root cause can be riser, cable, loose connection, etc. I used to worry about crash after I saw error 999 so I reboot the rig. But since 11.2, I have continued runing w/o any action if it only misses fan info (since I set fan at constant via afterburn). It runs anytime between 1~17 more days until a new issue cause the rig to hang (none of them is related to the GPU showing error 999 initially). If it misses temperature, then I would have to restart or reboot to fix the issue right away.
6 things through my observations:
1) If error 15 arise after error 999, your rig might hang
2) If the initial GPU showing error 999 crashes, the afterburner OC setting might reset to default (slow hashrate, higher power comsuption, much higher temperature)
3) If 2) occurs, the error 999 might spread out to other nVidia cards like a virus.
4) If you use UPS, another power outage might trigger error 999 (initially missing fan setting, now missing both fan and temp)
5) missing fan info is not as critical as missing temp
6) sometimes when you close the miner that GPU show error 999, afterburn will not start (nvdia driver issue)