Nah, 6.5 on both boxes. Slightly older 6.5.12 on Linux and 6.5.19 (the latest 6.5 + compute 5.2 support I think) on Windows. Tried x64 builds too, doesn't seem to make much of a difference either way. Weird shit. I did manage to make the win build a little better by manually unrolling stuff, just looks like the win version of nvcc isn't really trying to figure stuff out itself. Which brings me back to weird shit.
You should fork my branch and merge the lyra2 changes. My fork is already 500KHASH faster than the DJM34's opensource without modding the lyra2(only the other algos). Big donations are waiting.
hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...
edit: actually the main difference I saw from my original setting, was by raising the intensity (which is a parameter adjustable by the user even in my release)