T12x32 is way to big for scrypt-jane. That's what you should use on regular scrypt coins. For yacoin that kind of config would require something like 50GB of GPU ram.
Autotune is not perfect. Look at the chart (run with -D to show debug info) then try configs that are just outside the populated area of the chart, until you've explored all the edges and found your limits. When you go too far it will print something like this: "GPU #0:Launch config 'T11x2' requires too much memory!"
I'm using T20x1 on my 780 and getting 3.4 kHash. T10x2 works well for me as well.
Thanks for the tips. I've tried some but T10x1 seems to be the best still. T20x1 doesn't perform well at all on my 780 - only 1.2 kH/s :\
Driver issue maybe? Running 331.92 here.