you need to optimize the OpenCL drivers in order to get into that speed.
What do you mean by 'optimize the OpenCL drivers'? What drivers?
(Once upon a time I found the SDK 3.0 is an insane registry eater while previous version, don't remember exactly was it 2.9 or 3.0beta, was faster but buggy.)