And to talk to myself...I think I found the problem. Let me see if I can quickly fix it.
EditFixed pushed to
http://github.com/chromicant/cpuminer on the sse2 branch. Can people can pull from it and test to make sure it builds on other machines? It works on mine Ubuntu 10.10 x86_64 box with YASM 1.1.0.2352.
It also contains a minor update to the assembly code to remove the few pipeline stalls. I was playing with a profiler, and looking at some data people posted here, and made some minor fixes. Don't really see any noticeable speedup, but it should be there in theory. And we know how well theory is in practice.