Try using K4x32 I get about 180kh/s on my 650ti
180? With stock (no overclock) i arrive at 70...
it may help to reduce the x32 to x28 x24 x20 and to compare the speeds. My GTX 780 is a bit tricky there as well.
Also have you autotuned that K kernel?
I don't mine with VTC (and i've not a lot of free time in this period) so i've only try 'cudaminer --algo=scrypt:2048 -C 0'
Do you think that change that can multiply the result of 3 times?