before compiling, set all occurences of #define MAXWELL_OR_FERMI
to 1 then it runs a good chunk faster on Maxwell.
it comes with it set to 1 as default so i just compile it

compute 3.0 and 2.0 i change it to 0
I'm pretty sure the compute 5.0 in v.08 was a decent jump in speed compared to the 3.5 on my 750s, my memory could be foggy though. Maybe it was for something else?
Can you do a compute 5.0? I think if work calms down I'll start compiling or tweaking as well, it's been a while.