Re: SHA256d IC design question

Quote from: the_electronrancher on January 05, 2018, 06:47:44 PM

I'd like to learn a little more about this transistor level implementation, I'm having a hard time picturing what could reasonably be exploded or minimized in the hash core. Xor? It's just flops and wiring otherwise, I would be surprised if the flop was exploded, but maybe - if you have any links to check out, it would be an interesting read.

Here's the example of what can be optimized with the transistor-level knowledge.

SHA-256 has 64 rounds that when unrolled have values that once computed have to be used in 16 different places (fanout of 16). For this example lets simplify and assume that there are only 2 inputs and 6 outputs.

Code:

a<=
b<=
c<=
d<=
e<=
f<= x + y;

This can be optimized to:

Code:

a<=
b<=
c<= x + y;
d<=
e<=
f<= x + y;

The optimization is that the same value is computed twice, but in different physical locations on a die and the signal needs shorter routes from the source to the destination. Here's more in-depth explanation:

https://en.wikipedia.org/wiki/FO4

Note that the above optimization is the opposite of the ASICBOOST "optimization".

We know that recent Bitmain chips are have capability to work both in regular way and boosted with ASICBOOST (with theoretical maximum of about 25% savings). We also know that when used in the boosted configuration they need to be clocked much lower and have lower overall performance (and probably lower yield of chips that could work in boosted modes).

If Bitmain was capable of accurately simulating their chips they wouldn't waste their resources on that exercise because 25% is lower than the normal manufacturing tolerances on the process nodes they were using. Transistor-level simulation is nowadays more accurate than the manufacturing variance and one could actually simulate the performance at the various process corners.

From the above we can deduct that they don't have any sort of transistor-level design, they just use standard cells and sandbagging the design with wide safety margins. That is the same thing that KnCminer did years ago.

The other possibility is that Bitmain did implement their chips dual-capable (both boosted and un-boosted) for some non-technical, political or personal reasons. But that would mean that their chips are even less optimized than they could be without wasting space on the unused boosting logic.