I thought I'd share my experience troubleshooting a very hot miner. Although temperamentally slowing down, this miner would perform at 100% after a reset, usually staying there for a day or so. I thought I'd replace the thermal paste with some arctic silver to see if it brought down the temperatures from ~90C. Long story short, it did not. Now one board isn't responding, and the other has overheating problems. Disregarding my extreme stupidity, it was a valuable education in the unit's operation. Here's a video I shot of it in operation:
https://www.youtube.com/watch?v=7nqr9Dsli24Note the orange and red status lights on each board. I'm assuming those are individual core hashing status. The "bad" board (on the left) would simply flash the orange lights as shown in the video, then reset itself (the fans go to maximum briefly then reduce in speed).
It would be really nice if they released the source code for the firmware used in the TI microcontroller they're using. I'm guessing they don't because yahoos like me would change some setting that would definitely fry the chips or electronics unintentionally. I don't blame them, but still want it. What the device does can be partly ascertained from the cgminer source code, but all that is pretty high level message passing which mainly involves reporting of values, and not setting of them.
I am in a similar situation as you. I applied arctic silver and now one board has the status lights on constantly. Is there a way of knowing what is causing the problems from the status lights?