Post
Topic
Board Mining (Altcoins)
Re: [OS] nvOC easy-to-use Linux Nvidia Mining vBASIC || Community Edition 2.0
by
WaveFront
on 02/06/2018, 14:47:26 UTC
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink



Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u


Isnt it better to do the full REISUB sequence?


Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul  suggested that I go with the SUSB sequence



I think the 2nd (S)ync wont do any thing as system is mounted as read-only in previous (U)mount step.
Can you please check the full REISUB and see how it works ..
Hey, very interesting subject. After several tries, I approached the kernel panics problem with a hardware solution.
I have a RaspberryPI, with one of the GPIOs, interfaced to the reset pins of the motherboard. The RPI checks every 30 seconds the status of the SSH port of the rig (I find it more reliable than just pinging the mobo).
If the mobo is unresponsive for more than 10 minutes the RPI resets the rig.
I will publish the RPI scripts and schematics as soon as I have 10 minutes :-D