Many Thanks for this.
Using Patoberli's build of commit 111 I was able to play around a bit. T kernel in Windows on my Titan is very unstable during autotune unfortunately anything that allocates more then 3GB of VRAM just crashes Cudaminer outright. Not sure what direct limitation is causing this but this is a consistent observation with several hours of manual configurations. The Titan Kernel also heavily favors multiples of the old T16x1 such as T64x1 -L 1, T64x2 -L 2, etc. Not sure why but it makes picking out optimal settings easy

On my Titan I was able to test and get 5.6-5.8 kh/s (varies but fairly even spread) using -i 0 -H 1 -l T32x8 -L 4 -a scrypt-jane:YAC with a mild Core OC of +250.
I will submit this and full details to the spreadsheet after a full night of stable submissions

Edit: I have broken 6Kh/s, but only about 80% were validated

nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.