What you're noticing is precaching the PoW for your next transaction. I designed the PoW so it uses information from your previous transaction so in the usual case you can fire off 1 transaction instantly and the work generation latency gets hidden in the time until your next transaction.
Part of this will be helped with the OpenCL module I'm putting together for people who need to handle higher transaction loads.
I have no problem with the PoW. I DO have a problem with it blocking vital operations such as the RPC calls.