@Gomeler Agreed on both points. Design matters and we won't be moving past 28nm for a while.
As for design there is something horribly broke in BFL design and I can't see them doing a significant redesign at this point. Bitfury and BFL aren't the only two with vastly different efficiencies at the same scale. Granted both of these are vaporware at this point but I find it interesting that Hashfast is reporting ~0.6W/GH (at the chip) and KNC is reporting ~2.2W/GH (at this chip) and both of them are on a 28nm process. We will see when it gets to real silicon but if that holds it is interesting.
As for next gen. My guess is we won't be seeing anything smaller than 28n for a long time. 20nm is expensive and there is also a troubling trend which Nvidia highlighted a while back in that the cost per transistor isn't going down much even when processes become mature so the smart money may be on optimization. I do think there will be incremental improvements on the same process (the "tock" in Intel tick tock strategy). I wonder how efficient 28nm chips can get (GH/mm2 and GH/J). When the low hanging fruit is gone it comes down to who can build the better company. All that boring stuff like finding solid suppliers, optimizing pricing so you can reduce incremental cost through larger runs, building an OEM network. I think eventually most of the ASIC companies will just produce chips and leave the rest to partners. I also think there will be less companies operating in 2014 either through failures or consolidation.