Do you notice an improved performance using CUDA 7.5 over of 8?
I got +1MH using the last version of the code from the GIT + CUDA 8.
x86 and x64 seems to be the same performance wise.
x86 reboots my 1080/Ti rig (but all x86 caused reboot yet), the 1070 rig runs fine and +1MH faster x86 vs x64
x64 cuda 8.0 "faster" then 7.5 on 1080/Ti
I'm talking about +1MH, same what you noticed
YMMV