The issue with the constant reboots of systems with a lot of GPUs seems to be related to single cards. As overclock apply to whole machine you could have a single card/cards that can't handle the clock and hung means reboot of machine. Here is a screenshot of two 1:1 machines with same HW, 1st one reboot every 1hr, 2nd one is not. So my suggestion to keep the logs files "current minus 1" will help to see which card to underclock. Can we put clock values with commas to separate for different GPU: like 100,150,80,200,100.... ?
https://preview.ibb.co/c7gYyw/11.pngHello Sony87,
I check your picture, please inform me, what do you set up to see all C/fan AND ALL CORE/MEM for your all cards?
I have 11x RX580 card, in field C/fan I see 11x, but in field core/mem only 10x.
https://gyazo.com/d22446cf9fb513cf6740ca6cbd2111f9