The hot spot in a primecoin miner is testing numbers for primality by using Fermat tests (with base 2). Porting the sieve to the GPU is almost pointless, because only 1% of the computing time is spend there.
I was under the impression that the sieving part of the algorithm was taking up much more than 1% of the computing time. For example, currently my "Sieve/Test ratio" is at 75%/25%... or am I confusing things?
Redacted that statement, but still, even if you made that 75% sieving 1000 times faster, you would still need those other 25% of the time and wouldn't mine much faster.