I get the same ~ 580mk\s
OK Thank you for the test.
With the optimizations suggested by arulbero , with few memory transfer improvements, by adding specific GPU intrinsic (notably the funnel shift that should improve SHA and RIPE performance), I hope to reach 1GK/s on your config.
