Fwiw I think the stales issue is related to the size of the respective pools. I don't know where the bottleneck is at ARS but it could be bitcoind providing work fast enough if they are not using a patched version (unlikely) or more likely its the software in between (I don't know what they're using - pushpool?) that is having to handle all the requests and record the outgoing work. Either optimising this, or alternately, creating multiple instances of this, should help.