Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.
Just out of curiosity, how do you tell what worksize you need for a specific card?
There is no general rule. It mostly depends on the architecture and memory technology used. In heavy scientific calculations best worksize is usually the one that the card can process natively but in mining where a single loop is very simple and fast the optimal worksize can vary. In mining lowering memory clocks saves power and therefore may allow for extra OC on the core thus speeding up computation. If you lower your memory clocks too much it can lower your processing power but this kind of loss can be compensated by lowering worksize.
So without solid background in high speed computation architectures the fastest way to know is trying out all possible combinations.