if I use -l T15x24 with good results on 780Ti with 15 SMX you should be able to use -l T14x24 on the Titan. I believe it has 14 SMX enabled, whereas the GTX 780 has 12 SMX.
Also be sure the card is not configured for double precision computation (it makes it slower for single precision work loads)
I also had that one bizarre autotune result with 1000 kHash/s, but that was too good to be true... and in fact it wasn't working for me either.
Getting 650kH/s on T15x24 on a Gigabyte 780Ti OC edition with the latest cudaMiner version.
