I am working on some secret sauce to make Kepler generation devices even faster.
The secret sauce is currently in github in the test_kernel.cu (which now requires Compute 3.5 to build). Be sure to run autogen.sh to update the Makefile before compiling.
However the secret sauce seems to be an undesiredlly "hot sauce". The card runs a bit hotter and slower than the old Titan kernel. You get the experimental kernel by using a launch config prefixed with -X
I get 417 kHash with the experimental kernel and up to 450 kHash with the old Titan kernel (cudaminer 32 bit version). The experimental stuff is interesting from a technical standpoint. It gets rid of shared memory entirely, replacing it with Kepler's warp shuffle functionality.