Hello guys.
First of all, big thanks to all the team for the great job on this project.
Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger
Edit 5watchdog :Change all the sudo reboot occurrence to
sudo magicreboot.shDo the same in 6tempcontrolfor more reading about Sysrq:
https://en.wikipedia.org/wiki/Magic_SysRq_keyHope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team

As explained in the wikipedia page you linked, those magic sysrq keys cannot work if kernel panics occurred. I had months ago a faulty riser as well and experienced the "GPUx fallen off the bus error" but it never caused kernel panics, the watchdog detected the miner error state and restarted it (with one gpu less) without rebooting. What this script is intended to do?
Note I also have Intel WDT Driver in use, may be it didn't allow me to experience those reboot hangs you mentioned.