Using a titan with texture cache enabled it seems to be performing with more luck.
Except that the Titan kernel is exactly identical to the 04/09 version.

so enabling the texture cache can't do any harm - because the code isn't included. That's a good thing, I guess..