I'm surprised that BFL didn't get better efficiency than they did. Just going from 90nm to 65nm should double efficiency, right? Then where's the additional advantage of using the full custom approach? I'm comparing with the current power estimate of the bASIC.
It's not just the fabrication process size that matters. The implementation of the hardware is a huge factor.