Christian, are you communicating with your Nvidia Friend about CUDA 6? Will it give any performance enhancements for our old Fermi cards?
the communication was so far limited to a kernel submission from nVidia.
It's a high register count (1 hash per thread) Compute 3.5 kernel that gives some marginal improvement over Dave Andersen's work. Unfortunately it's not well suited for implementing a LOOKUP_GAP.
Christian
Told you that your work was getting noticed. Just didn't know it went all the way up to Nvidia itself.

On a more related note I would imagine you welcoming CUDA 6 with open arms due to simplified memory management.
Additionally the ARM cpu that should be on Maxwell cards should be really nice for mining. I envision a Maxwell kernel that uses it handle things that aren't great for the GPU while getting CPU usage to a more consistently near zero level.