I "implemented" your code into cpuminer on an EC2 yesterday and can confirm the 220kh/s. Very nice!

Thanks for your work on this and for sharing. Very

.
Although it seems to slow down over time but IIRC you mentioned that in the readme. After a restart of the miner it's running normal again.
Now if only cryptocurrencies all around wouldn't be crashing like no tomorrow right now...

Grab the updated version of CudaMiner and compile with CUDA 5.5. Shouldn't have a slowdown problem any more - I think my issue was the old CUDA on my GTX 650Ti machine.