He's published his secret sauce. So let's race to integrate it - hopefully we can compensate for the last 3 difficulty increases

Nice. I can then finish mining on Coinotron and send you a 1LTC donation finally.

github has the new code in test_kernel.cu currently (kernel prefix is X). It requires the -m 1 option.
still experimenting with it... There's still some work to be done before it can be part of a cudaminer release (i.e. support Compute 3.0 and also the -C option)
What seems apparent already is that there is no 20-30% speed gain. Weird.
On a non overclocked 780Ti I was going from ~440 kHash to 487 kHash (a 10% improvement)
On a GT 750M I was going from 55 kHash/s to 59 kHash/s (no texture read caching implemented so far)
BTW: 1 LTC is too much of a donation to ask for with today's exchange rates.I need to update the readme file regarding this.