Apple wrote their own OpenCL implementation for OSX instead of using far more optimized ones supplied by AMD or Nvidia. Using OSX means you lose about 20% of your speed right off the bat. Try toying with -z 0 in combination with -w 128.
Unrecognized option: -z
So i have the binary and the option isn't there. I had to download the source and compiled it. I have no speed increase at all...
btw Will be great to have more documentation about what the switch does exactly and how to compile it, i had to dig for that

I only recently added it. Make sure you have the absolute newest binary. Try using somewhere between -z 0 and -z 20, you might get speed increases, you might not. I don't think OSX's impl has bitalign, so you're losing a lot of performance and there might be no way to fix it.