For existing FPGA design the best can be had is 23MHps/J. There is no reason to anticipate an improvement in FPGA power efficiency, yes, there can be marginal reduction of overhead and the FPGA can be scaled up, but it's efficiency will not increase all that much. Based on existing designs we can anticipate 25MH/J for FPGA. There is nothing special abut ASIC, most ASIC vendors just use a custom programmed FPGA; this is called FPGA to ASIC conversion. So at best ASIC will be 50MHps/J; and i am being VERY generous here.
2x energy improvement going from FPGA to ASIC? Really please provide a link to this claim. I have no idea (nor care) if any of the current specs are legit but FPGA are horribly energy inefficient compared to a dedicated circuit. An energy improvement of 1000% isn't even that amazing in the move from FPGA to ASICs and 5000%+ is certainly possible.
Here is an SHA-3 academic circuit (unoptimized performing stream hashing) where the purpose wasn't even to test SHA-256 and built using archaic 130nm process.
http://rijndael.ece.vt.edu/sha3/publications/DATE2012SHA3.pdf~150 MH/J. Remember this is (from Bitcoin point of view) an unoptimized design as it is designed to hash an abritrary amount of data. It is roughly 300% of what your claimed "theoretical max" would be and that is at 130nm. At 90nm (4 generations old) it would be ~300 MH/J. At 65nm closer to 600 MH/J. So where did this magical 50MH/J max come from? Just admit it ... nowhere. It doesn't even make sense. The whole point of the move to ASIC is to get MASSIVE reduction in energy consumption.
It's one thing to design processor application, and it is completely different task to actually design the processor. Something like radeon 6970 gpu has 2.6 billion transistors. For FPGAs like startan-6 we are still talking billions of transistors. For someone starting from scratch, can you imagine how long it would take to draw wiring schematic with a billion components? This is what you would have to do to design a brand spanking new custom ASIC. ASIC is Application Specific Integrated Circuit, so either you have to piece it together via FPGA conversion or you have to design this circuit from scratch. This task would be not easier than it was to design FPGA like SPARTAN-6 in the first place.
Nobody designs ASICs by hand just like they don't design FPGA by hand. They use high level libraries and design tools. Nobody cares where each individual transistor goes just like a programmer doesn't care which exact memory address every single bit of memory goes. It is abstracted away. Comparing these chips to either FPGA or GPU is a false comparison. These are SHA-256 hashes and will be significantly simpler (and smaller) than any general purpose device like a GPU or FPGA.
We talking major dollars here.
Ok. Now say you had major dollars. Costing major dollars =/= impossible.