The default Bitmain 4.6.1 has been working pretty good for me. I get 1.9 TH/s average. And you have to set pseudo difficulty. It doesn't work properly without it. I haven't tried the latest version from ckolivas that was released within the last 24 hours, but I will try it now.
If you used pseudo difficulty with the existing S4 binary, you have circumvented the bulk of the issues, but there is still some gain with the new binary I provided. Just remove any queue value from the init script since 8192 is a crazy default.
OK. I tried your new binary and it appears the same. I assume because I'm using the pseudo difficulty as you stated. BTW, I am using "--queue 0 --scan-time 1 --expiry 1", which is the typical recommended settings for p2pool.