Post
Topic
Board Hardware
Re: BITMAIN launches 4th generation Bitcoin mining ASIC: BM1385
by
2112
on 21/08/2015, 23:15:43 UTC
You have completely the wrong view of full custom, a rolled design would be a really dumb idea for a modern mining chip and very area inefficient, the customisation involves only two circuit elements, but I'm sure you know that. Not rocket science at all, no magic, and very little risk if you have some respect for semiconductor physics. DRC is there for very good reasons which again I'm sure you know, and only an idiot would even consider violating them.
The rolled vs. unrolled isn't a fully resolved choice. The losses and noise in the very long lines that drag the signals over 15 SHA-256 rounds are quite significant. I think the bitfury approach of routing hashed words in one direction and constant SHA coefficients in a perpendicular direction gives overall savings over trying to squeeze combinatorial optimizations after fully unrolling. Most of the combinatorial optimization gain is achieved by SHA-256 round pairing, i.e. 32 round-pairs instead of by-the-FIPS explicit 64 single-rounds.

I did not do a full analog modeling of both choices (rolled/unrolled) for SHA-256. But I've done something similar in the past that was bound by the speed of carry-look-ahead adders. I actually doubt that anyone here on this forum (maybe with exception of bitfury) did the required tradeoff analysis. My scientific will-ass guess is that Bitcoin miner has a possibility of being an example of one such circuits where leaving things rolled will be of great benefit. The very high toggle ratio (only -6dB below the theoretical maximum of a ring oscillator) will probably benefit from using some sort of SCL (source-coupled logic) or CML (current-mode logic) instead of the garden-variety CMOS bang-bangs that every CAD monkey throws at the Bitcoin mining problem.

People do fully unrolled hashers because the logic synthesis tools use heuristic place & route algorithms that don't converge or converge extremely slowly on the rolled designs.

As far as I understand the full DRC compliance at 28nm "mature" process is very, very conservative. I don't have any exact numbers handy, but the assumed gate  error ratios for a "digital" manufacturing process are way too high for Bitcoin miner that can easily tolerate a percentage point of errors. Violating some of the DRC to shed the unnecessary margins is one of the simplest ways to save power, after the obvious things like dropping JTAG and other testability overheads.

Re-reading your first sentence, I don't really understand the part
Quote
the customisation involves only two circuit elements, but I'm sure you know that.
Could you restate what you had in mind?