PS: for decred you can do pull requests directly on decred repo...
Here it is!
Pull request on both your fork and decred's.
~10% speed increase!
Have fun :-)
your first pull request gave +3,4%, while the second just +1% on 750ti's (both compared with tpruvot's code)
I'm sorry I don't have a 750ti to tune it with.