The latest version (2011-08-04) has a major problem that I can see.
The assumption that there won't be more than 1 valid nonce per kernel execution is very wrong. At aggression 14 for example each kernel execution tests 2^30 nonces. The chance that there will be more than 1 valid nonce in any given kernel execution in this case is going to be about 2.5% (if I did the math right) This effectively causes a net loss in performance compared to the previous version at high aggression. At lower aggression values (10 and below) this is less of a problem since the performance loss in these cases will be much less than 1%.