I propose to compile sha256.cpp with -O3 -march=amdfamk10 (will work on 32bit and 64bit) as only CPUs supporting this instruction set (AMD Phenom, Intel i5 and newer) benefit from -4way and it'll improve performance by ~9%.
GCC 4.3.3 doesn't support -march=amdfamk10. I get:
sha256.cpp:1: error: bad value (amdfamk10) for -march= switch