(trimmed awesome find from the architecture manual)
So ">> i" may run faster than ">> (i % 32)" in x86 or x86_64 because the % is optimized out, but is not a good idea because it's not portable and also >> with values larger than the operand size are undefined according to the C standard. Since in the miner this loop is done only once for each search of the 256bit nonce, you can do i%32 without any harm.
Maybe a logical AND instead of the % would be faster? of course you should profile instead of believeing me but I think optimizing this is not worth the trouble.
gatra
That explains it - thanks.
No optimization needed - the compiler will turn %32 into an AND mask anyway, and that part of the code isn't particularly performance critical right now.
I'll start going through and slowly cleaning up a few more of these, in addition to any that any of you spot. Thanks again for the bug spotting!