I was wondering when the community would discover this optimization... nice one bitless :-)
For the record, hdminer has implemented this maj() optimization since day 1:
# ibit_extract patched to BFI_INT at runtime
$code .=
" ixor $tmp0, $a, $b\n".
" ibit_extract $tmp0, $a, $c, $tmp0\n";
Phoenix is probably very close to hdminer's performance now, on HD 69xx.
so this accounts for 3% out of your 6.4% improvement, where are the other 3.4% coming from? I don't have 250 btc, in fact I only have ~2 that have been donated to me so far. (Please note - if you decide to share this, post it,don't pm me