Pallas I compiled the pull request you were talking about and am currently getting a 4.5% increase (decred) with Cuda 7.5 on an 670 and an 770.
1120 total MH vs 1170MH now.
Nice! It looks like it's even more important for non-maxwell cards.
I should have another 4% ready soon.
My private #7 (0.1 BTC windoiws exe) is 25% faster. The sourcecode is for sale (0.4 BTC)