What seems apparent already is that there is no 20-30% speed gain. Weird.
On a non overclocked 780Ti I was going from ~440 kHash to 487 kHash (a 10% improvement)
On a GT 750M I was going from 55 kHash/s to 59 kHash/s (no texture read caching implemented so far)
Well a 10% increase is better than a kick in the teeth

Just a question, would it not be worth trying to contact the author and ask him if he can take a quick look and see if there are any other improvements he could make for us?
Lastly, could you release your latest update as like a Beta or something, I would like to test it but cant compile from Github.
And keep it up
