I'm the one who took this screenshot and is responsible for the hosting of _kai_'s machine.
Just so you know, the cgminer output paste was from a different boot of this machine. During that boot, it appeared that cgminer wasn't starting automatically, and the stats screen just showed 0s on the top line (hashrate, elapsed, etc.) and blanks everywhere else (pool info, ASIC status, fan speed). Because of this, I decided to SSH in and start it from the command line to get more debugging information. At the same time, I unplugged two hashboards, in the hope that the remaining hashboard (the bottom one according to both the screenshot and the physical position in the rig) would work. I then saw all of this HW error stuff in the log output. Later on, I switched hashboards and got it to run again with only the top hashboard. I then plugged in all three hashboards again, and got the attached screenshot.
The PSU is a server-grade DPS1200FB running on 240V, and supplies an open-circuit voltage of 12.30 V, and at full load usually gives around 12.15 V. I have not yet seen a DPS1200FB shut down from overload on an S7, and the problem does not go away if I power only one hashboard at a time, so I am inclined to think that the PSU wimping out is an unlikely explanation. Individual chips burning out from mild overvoltage or transient spikes is more plausible, but I think still unlikely.
Intake air temperature was below 20°C at the time of the fault. The rig crashed at 1:25am local time. None of our cooling system was experiencing a failure at the time that the rig crashed. Both fans on the rig appear to be functional.
Just following up with the result to this in case anyone else has the same or similar problem, one of my pet hates is people never updating 'please help me' posts with the resolutions to their problems, so I try not to do that myself
. (The data cables work fine.)
crap...