Hi Christian, me again. dunno if im barking up the wrong tree here but if i reduce the amount of cuda threads, i.e:
'case 16: fermi_scrypt_core_kernelA<16><<< grid, threads, 0, stream >>>(d_idata); break;'
Say i set threads to say 256 (<512). theres a massive increase in speed... But quite a few errors.
Why the errors?
Cuda is still new to me....
http://s22.postimg.org/qdcboxcwh/Capture.jpg