for nvidia cards (compute 35 and higher): github/djm34/ccminer-tpsp for amd cards : gihutb/djm34/sgminer
sgminer is still a work in progress, I am working on a new kernel best setting so far for my 290x -w 32 --thread-concurrency 512 -I 9 It is difficult to go higher in intensity due to the high mem requirement. (also current speed is rather low... for the same reasons)
This is really awesome. I can't wait for people to start testing this out.