I'll just... yea, leave this here:
https://github.com/tsiv/ccminer-cryptonightDo not motherfscking come to me with complaints about lag on Windows, you have been warned

Guess I'll take a look at breaking the one big kernel into smaller pieces that run sequentially, hopefully giving the OS some breathing room. Should at least help with the TDR problem.
Getting 300H/s with -l 5x120 on a 780ti, (compiled with cuda 6.0).