Using a titan with texture cache enabled it seems to be performing with more luck.
Except that the Titan kernel is exactly identical to the 04/09 version.

so enabling the texture cache can't do any harm - because the code isn't included. That's a good thing, I guess..
Guess thats good know that I'm just finding out the behavior of the card more. Is there any reason the titan can't benefit from the advances for the other kernels though? Shouldn't it be using the best kernel not compiled for compute 3.5 until it is understood why the nvcc compiler seems to break 3.5?