I was wondering when the community would discover this optimization... nice one bitless :-)
For the record, hdminer has implemented this maj() optimization since day 1:
# ibit_extract patched to BFI_INT at runtime
$code .=
" ixor $tmp0, $a, $b\n".
" ibit_extract $tmp0, $a, $c, $tmp0\n";
Phoenix is probably very close to hdminer's performance now, on HD 69xx.