Agree, Ethminer is still very lucrative although it looks like Nvidia hardware hit the memory controller limitation and can't go anywhere else, much like Cryptonote.
Look at lyra2v2. It's also a memory hard algorithm.
280x Sgminer opensource(200watt) 4MHASH
750ti Djm-34 (sp-mod) opensource (40watt) 5MHASH
It's like creating a etherum miner that does 35MHASH on the 750ti.
But the opensource is only doing 8MHASH...
(djm34 here...)
The main difference between the algo of ethereum and other mem hard algo, is that you can't rescale mem usage as it always requires the full dag file to run (ie 1.2Gb or so of vram with many random over the full dag file)
Meaning you can't really improve passed what has already been done... (yeah, I tried already

).
But you should be able to make the kernal run at copyspeed. (memory bandwidth limit) The gpu can do register operations while writing to memory. The keccak passes should be integrated and more than one hash per run. Then you will get keccak for free. (memory pipelining)