I did a rapid test over the code modification.
I get quite a lot of "hash for nonce ... does not validate of cpu"
However the shares are accepted and the speed is 35Mh/s (instead of 28~30 depending on clock speed and number of instance)
To what corresponds this 768, is this the number of cuda core of the 750ti ? (need to see how this can be updated to the 780ti).
I only modified the cuda_hefty1.cu (I am lazy...), compiled with cuda 5.5 (didn't use either --relocatable-device-code=true) and in principle compute_3.5