We are getting close to the 0.3.8 release, watchdog is ready, it catches init errors and stuck gpus and runs a user-defined bat/.sh
Trying out the new release now!
I don't see any information in the ReadMe as to how to use the watchdog feature. Can we specify a user-defined script by passing a parameter?