Thanks!
Using: -k phatk DEVICE=0 VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=13 on a single 5830 and get 312MH/sec

I have always found phatk to be slower. Why did you choose to use it? Also, why WORKSIZE=256? I have also found 128 to be faster on the 58xx and 69xx series.
I am using poclbm at the moment getting 0.2% stale rate [not using BTCMine at the moment though].
I chose to use phatk because I saw some other posts recomend using it on 5830/5850 cards and it is faster for me by 9-10MH/sec. WORKSIZE=256 with phatk also is faster for me (by another ~4MH/sec)
Here are some results from my testing:
297MH/sec = pocldm & WORKSIZE=128
304MH/sec = pocldm & WORKSIZE=256
308MH/sec = phatk & WORKSIZE=128
312MH/sec = phatk & WORKSIZE=256
I am getting quite a few rejected blocks though, so i'm not sure if that has anything to-do with the above settings or something else.
Rejects are higher now with the incredible pool hash rates. The answer is probably a new protocol to replace RPC+LP, but your rejected shares could be from your card faulting on calculations and giving you invalid hash results as well.
I will have to play with 256 worksize on my cards again. I never tried anything but 128 on my 5850, but when I tried 256 on my 6970, I had a noticeable decline in hash rate reported by the miner. I will try that again too

EDIT: I just tested 256 worksize with my 6970 and rate dropped from 393MH/s to 323MH/s. I find it hard to believe that the 5850 would be better suited to 256 than the 6970, but perhaps so. I don't have time to test on my other box with the 5850 at the moment. BTW ... I get 319MH/s with my 5850 now ... stock with using 128 (pushed core speed to 850MHz and reduced memory speed slightly to 900MHz). Identical with both poclbm and phoenix [using poclbm kernel].