With
-D 1 -v 1 -w 256 -aa
I get ~539 MHash/s ... this should be the preferred command line, right?
I'm currently testing other kernels for GCN performance

.
Dia
-D is useless unless you're turning off other cards. -w 256 is default. -v 1 is default -aa does _absolutely nothing_ and I've already renamed it to something else in a local branch.
For the first time in history, no arguments are the best. I have no clue how the hell that happened.