not entirely sure...
#thread = 256 * 64 * 5 on cuda 7.5 intensity is #thread = 2^intensity (so it is between 16.47 and 16.48)
and
#thread = 256 * 64 * 4 on cuda 6.5 (probably possible to raise it here) ==> intensity = 16
THANK YOU djm34!!!!!

my 970 now run @ 657 kH/s!!!!!!!!
before i'm using sp release 7.4 and get around 542 kH/s.
i'm currently using your cuda 65 version.
can you give more info the difference between standard and nosync version?
Cmon guys, i see a lot of results here, screenshots, speeds from different cards.
If it's not to much to ask.
Add those results here :
Mining Hardware Comparison 
already submitted

the nosync refers to the command __syncthreads() which should be used when initializing the shared memory. However in some case (and some card) it isn't really needed and as it tends to slow down the kernel and removing it may increase the hashrate. however on some card it may cause some cpu validation issues.