Post
Topic
Board Hardware
Re: HashFast launches sales of the Baby Jet
by
DeathAndTaxes
on 30/10/2013, 15:55:41 UTC
Ytterbium,
Thanks for the correction on rounds (65 vs 80) and for the link.  Link saved.  I assumed 80 because SHA-2 is a 80 round cipher they must do some preprocessing or optimization which makes sense and it roughly fit with likely frequency range.

As aerobatic stated the GN die efficiency is 1.23 GH/mm² (nominal), 1.66 GH/mm² (overclocked).

Saying Bitfury has a theoretical speed of 5 GH/s is kinda pointless, it was the design goal but never acheived in real world not even once under lab conditions.  Real world Bitfury is more like 1.5 GH/s (nominal), 3.0 GH/s (overclocked)*.  That gives Bitfury (@55nm) a die efficiency of 0.10 GH/mm2 (nominal), 0.21 GH/mm2 (overclocked).  You stated 40nm but Bitfury is actually 55nm, and the efficiency (both GH/mm2 and J/GH) are impressive for 55nm**.  The numbers might look low to some reading but that is the power of a couple doublings.   To show you some bad efficiency BFL for example is 65nm but lets boost their stats by 40% (65^2/55^2) to put BFL and Bitfury on the same 55nm process node.  BFL's die efficiency is 0.062 GH/mm2, with a 40% boost for 55nm vs 65nm for apples to apples it is still only 0.087 GH/mm2.  Now that is with the chips overvolted and driven pretty hard and hot.  Ouch BFL.

So how would a Bitfury @ 28nm compare to HashFast Golden Nonce?   Note: this shouldn't influence anyone purchase decision as we don't even know if Bitfury plans a 28nm, when it will be released, if they will hand place it, and if these theoretical gains are possible.  With that caveat, a die shrink from 55nm to 28m means 4x the transistor density (552/282 = 4.0).  Lets be generous and say 50% higher clocks are possible (real world is probably going to be less than what raw capacitance and switching time would indicate but I am trying to err towards the upper limit)  Between clock increase and transistor density performance is likely going to be capped at 6x.  A theoretical "Bitfury28" would be ~0.60 GH/mm2 (nominal), 1.26 GH/mm2 (overclocked).  This is with no architecture changes just same design and smaller features (the "tock" in Intel's "tick tock" strategy) and scaled out parallel (more cores similar chip dimensions).  There are power issues which prevent Bitfury from achieving the 5 GH/s (420 Mhz) design spec at 55nm.  If they were solved efficiency would be higher.  This is somewhat academic as the 5 GH/s spec was simulated due to the highly efficient use of hand placing.  However hand placing has a lot of pitfalls and Bitfury fell into one making real world performance lower.  "Bitfury28" while having the same general design may not be hand placed.  It is time consuming and more risky.  At 55nm it didn't pay off all the extra work and effort produced a chip which is roughly what we would expect from a cell library design.  Still to be totally exhaustive if the Bitfury had performed as expected it would be 5 GH / (3.8mm * 3.8mm) = 0.34 GH/mm2.  With 6x performance improvement that would be ~ 2.00 GH/mm2 with a more realistic 4x performance improvement ~ 1.36 GH/mm2.

To add to what Aerobatic said about Bitfury being a hand placed chip, this has some implications going forward.  Very few designers hand place large ASICs because of the increased time and cost.  Cell libraries are used to assist with the feature change size, without them you are essentially hand placing a new chip at each process node.  While the design may be exactly the same, the position of each transistor is going to change.  One also needs to consider risk vs reward.  Bitfury's power issues stem from the fact that design was not capable of delivery the power intended, without the intended power the intended clock frequency couldn't be achieved.  The chip "worked" but had to be clocked slower.  It is possible this type of error could have been avoided using a cell library.  So had Bitfury used a cell library maybe they would have got 4 GH per chip, less than what hand placing could do in theory but more than it did in reality.  In mid 2013 Bitfury mistake wasn't fatal.  BFL was not delivering, margins were massive, and other competitors were using much large process nodes.   In 2014 the scenario won't be the same.  A similar mistake would make an offering less attractive than competitors.


*Based on reference design of ~25 GH/s per "H-board" with 16 chips.   A few higher clocked variants (like S-board project) have pushed that to 42-45GH/s.  Assuming I am missing some marginally higher clocked board I optimistically used 3 GH/s as realistic overclock limit.


** Moved this down here because most people probably don't care.  A side note, it may seem I am critical of Bitfury but I am not, I am just interested (obsessed maybe?) with finding good data.  Hell IMHO it is impressive that Bitfury is even still around.  Right at the point where they had perfected their FPGA design (better than anyone else on the same hardware) and were looking to mass produce (summer 2012) BFL began the obviously false campaign of "ASICs in 3 months".  Remember when BFL was going to delivery ASICs by fall 2012?  BFL may be crooked but they are smart.  By offering upgrade value on their FPGA and over-promising ASICs almost a year early it killed the rest of the FPGA market while still keeping BFL FPGA sales alive (they could be upgraded).  Many startups wouldn't have survived seeing their entire market disappear with no revenue potential for a year.  It was unfair, dishonest, and a sucker punch but it wouldn't have surprised me if BFL had killed Bitfury.   Instead the Bitfury team transitioned their FPGA design to a hand placed ASIC in a short period of time, delivered solid performance and did so without the ability to collect (and sit on) preorder money in 2012.  So I am impressed by what Bitfury has done but that doesn't diminish the impressive die efficiency of the Golden Nonce processor.