Hi Fullzero
I want to share with you a GPU failed that the watchdog is not able to detect
wdog screen:
GPU UTILIZATION: Unable to determine the device handle for GPU 0000:09:00.0: GPU is lost. Reboot the system to recover this GPU
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Unable: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: determine: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: device: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: handle: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: for: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: 0000:09:00.0:: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: is: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: lost.: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Reboot: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: system: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: recover: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: this: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
Tue Jul 25 16:57:01 CEST 2017 - All good! Will check again in 60 seconds
GPU UTILIZATION: Unable to determine the device handle for GPU 0000:09:00.0: GPU is lost. Reboot the system to recover this GPU
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Unable: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: determine: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: device: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: handle: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: for: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: 0000:09:00.0:: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: is: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: lost.: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Reboot: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: system: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: recover: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: this: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
Tue Jul 25 16:58:01 CEST 2017 - All good! Will check again in 60 seconds
the miner show/detect only 6 GPU over 7
nvidia-smi doesn't work
$ nvidia-smi
Unable to determine the device handle for GPU 0000:09:00.0: GPU is lost. Reboot the system to recover this GPU
temp screen:
Provided power limit 75.00 W is not a valid power limit which should be between 115.00 W and 291.00 W for GPU 00000000:0A:00.0
Terminating early due to previous errors.
Tue Jul 25 17:01:07 CEST 2017 - All good, will check again soon
GPU 0, Target temp: 61, Current: 60, Diff: 1, Fan: 75, Power: 123.46
GPU 1, Target temp: 61, Current: 60, Diff: 1, Fan: 63, Power: 124.62
GPU 2, Target temp: 61, Current: 59, Diff: 2, Fan: 77, Power: 119.23
GPU 3, Target temp: 61, Current: 60, Diff: 1, Fan: 68, Power: 120.72
GPU 4, Target temp: 61, Current: 59, Diff: 2, Fan: 57, Power: 124.26
GPU 5, Target temp: 61, Current: Unable, Diff: 61, Fan: to, Power: determine
/home/m1/Maxximus007_AUTO_TEMPERATURE_CONTROL: line 125: [: Unable: integer expression expected
/home/m1/Maxximus007_AUTO_TEMPERATURE_CONTROL: line 158: [: the: integer expression expected
/home/m1/Maxximus007_AUTO_TEMPERATURE_CONTROL: line 171: [: to: integer expression expected
GPU 6, Target temp: 61, Current: 55, Diff: 6, Fan: 50, Power: 126.76
Tue Jul 25 17:01:37 CEST 2017 - Restoring Power limit for gpu:6. Old limit: 125 New limit: 75 Fan speed: 50
Provided power limit 75.00 W is not a valid power limit which should be between 115.00 W and 291.00 W for GPU 00000000:0A:00.0
Terminating early due to previous errors.
Tue Jul 25 17:01:37 CEST 2017 - All good, will check again soon
I believe this is the exact problem that Maxximus007 recently made a new code block to resolve.
Fullzero,
I'm getting this error as well, and looks like watchdog is not rebooting the system.
I believe I have the latest bash files.
are Maxximus007's changes to resolve this issue in the current bash files?
Thank you.
GPU UTILIZATION: Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Unable: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: determine: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: device: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: handle: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: for: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: 0000:01:00.0:: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: is: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: lost.: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: Reboot: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: the: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: system: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: to: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: recover: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: this: integer expression expected
/home/m1/IAmNotAJeep_and_Maxximus007_WATCHDOG: line 44: [: GPU: integer expression expected
Sat Jul 29 21:07:09 PDT 2017 - All good! Will check again in 60 seconds
I'm experiencing same issues sometimes, that's how I fixed, I made a python script to check for GPUs failure and reboots if it's not mining with all the GPUs,
At the bottom and save crontab.
Now, every 5 minutes (you can adjust it) it will check if all of your cards are up and running, in case of problems it will reboot your rig, so far this is making my life easier. Hope it helps you