This gets stranger and stranger. I had some time this evening and took a chance with additional troubleshooting since I'm pretty dead in the water otherwise. The board powers up fine when A,C,D PCI-E are plugged in, but B is what causes the issue. I see nothing wrong with that connection.
If I try to get the board to hash in this state - I am not sure this is possible - I get the Operation status 20 error and it continuously re-detects the board. I have not touched anything to do with firmware and since it doesn't seem to be helping the others with that problem and I'm not going to try yet. I am wondering if it is complaining because that one isn't powered up and if it is possible to disable a chip. I'm not even sure if things are set up that way - just hoping for some kind of way to get it partially alive again.
Thanks for any help.
In a conversation at some point I was advised by Dave that it was worth trying to disable certain dies to resolve this issue. Essentially the idea is to identify the bad die then set its voltage to ZERO so it will then be disregarded and allow the healthy dies to work. So using the version of the hcm tool I have the command is something like # ./hcm --write-die-settings 1:0@0 if you wanted to disable die 1. I tried this tonight but it did not solve my problem.
Hopefully Dave can chime in with further suggestions. I will report back if I have any luck tomorrow.
It's weird that it continues to give you the error 20 even with a die disabled. Did you try running them each individually to see if they'll run one at a time? IE,
./hftool.py -w 0:950@900/1:0@900/2:0@900/3:0@900
then ./hftool.py -w 0:0@900/1:950@900/2:0@900/3:0@900
etc.