If you under clock your video RAM you'll see improvements using a worksize of 256 in my experience.
My settings:
GPU: 1065 MHz
RAM: 300 MHz
VCC: 1.11
phoenix.py -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Yeilds me 235 Mh/s on a Radeon HD 5770
lol, I'm already doing a little better than 235, but I guess thats just the difference in cards

Thanks for the tip I'll let you know if I can get it up and running. I tried doing a GPU up/Mem down underclock at stock voltages but could not get anywhere near 1000 Mhz for GPU or 300 Mhz for mem.