I'm starting to think that a process-invariant metric of power efficiency isn't possible -- at least not one that can be determined by testing (i.e. without the circuit schematics and layout parasitics, neither of which any vendor is ever going to release).
yeah, that's what I was trying to say

This might work for FPGA designs, but not true ASICs.
A crappy layout done with a better process, or even a better packaging that allows for more heat transfer might do better then a great design done with a crappy process or have some flaw that makes it run hot. A company might spend it's R&D money improving the yield, figuring out the best thermodynamics, etc.
Think about it this way, lots of people used to rag about how x86 was an inferior Instruction set compared to RISC designs, but x86 always ended up having better performance in the end because Intel and AMD competed with each-other, and had a lot more money to throw at working around the 'problems' with x86 (like using RISC internally and translating the instructions it)
The most pure way of doing, the designs that are closest to perfection don't always win out in the end. All that matters is the real-world performance.