The hot spot in a primecoin miner is testing numbers for primality by using Fermat tests (with base 2). Porting the sieve to the GPU is almost pointless, because only 1% of the computing time is spend there.
I was under the impression that the sieving part of the algorithm was taking up much more than 1% of the computing time. For example, currently my "Sieve/Test ratio" is at 75%/25%... or am I confusing things?