The primary reason I was holding back on the release was the restructured sieve code seems to be far simpler to port to AVX/GPU (Both CUDA & OpenCL) and to be perfectly honest, I want (wanted) the accolade of being the guy who "cracked" GPU mining for XPM.. While I have part complete implementations of these versions (AVX & GPU), I have not released these yet as they do not currently build/work.
Wow! If this would work, It could change the edge of the way primecoins are mined nowadays; single people could do this in a profitable way again, at least in the beginnings.

The question is, how powerfull is it? Have you already been able to perform single benchmarks or unit tests on it? Or is it too early to tell?