I don't think it is so inefficient, in the start I think it was designed to be FPGA and ASIC resistant although if something is worth enough then it will be developed. It has to have a lot more memory on die to do the calcs and that introduces a bottleneck accessing memory. I am just guessing but sha256 may not require so much memory everything can be built out of logic gates and there for higher speed. I think (or thought) I read that an scrypt hash engine requires at least 1mbit of memory and chips now a days have 100+ engines while 100Mbit of memory is not a lot it may take a lot of die space to put 100 engines and 100 separate pieces of memory and interface all that out the chip. Could be way off on all this just my guess.