NVIDIA is geared towards floating point, while bitcoin's SHA256 algorithm wants integer math.
ATI GPUs are better at this.
You are maybe right, I don't know well the inner set of instructions per GPU-brand/type.
The instructions usually used for SHA-256 (IMHO, all the SHA-2 implementation as they use the same
scheme just the size is different) implementations are all the bit-wise (AND, OR, NOT and XOR)
operators on 32-bit word, the right shift instruction but also the rotate right/left instructions.
A comparison of all cycles required for all the instructions per type FPGA, GPU, Cell-like or other
SIMD could be useful. I don't know if someone in the forum already made this along with a rough
estimation of the cost per technology.
On the other hand, building something for SHA-2 that can be reused for other projects
relying on SHA-2 is not a waste of time/money.
If you or someone else build something in that scope, I will be willing to invest some time
and money in the project.