Last few suggestions I have.. use APP SDK 2.6 or 2.7 - they are apparently the best for 7xxx cards, although I've not noticed any problems with 2.8. Only other thing I can think of is the version of pyopencl? - possible an older version is faster for some reason?
Again, I'm using a fairly recent version (updated within the last month or 2) of pyopencl.. so not sure how much that helps you!
Finally, try 12.8 or 12.11 cats, from memory people had success with those.. I'm just going on forum posts and things I remember though, as I've never really experienced many problems using linux.