Okay, the first GPU-enabled version is ready. Right now it only does the sieving on the GPU. Primality tests are still done on the CPU. The code hasn't been optimized at all yet; the performance is equivalent (+- a few %) on these setups:
1) Phenom x6 1055T, all 6 cores
2) Phenom x6 1055T, all 6 cores, plus a HD6990 using both cores
I don't think the current version is worth releasing. It still needs a bit of work, I want it to be twice as fast. I estimate there is room for a five-fold improvement, especially when I get the primality tests done on the GPU as well.
Tomorrow I will do tests using a slow CPU with lots of GPU power (Sempron 140 with 2x6990).
I think it worth a release... At least we can have pool set up based on the miner.