860k tries per second per core here (Intel(R) Core(TM) i5 CPU @ 2.67GHz)
Yeah, btcrecover has a lot of scaffolding to actually generate the passwords to test... still that's faster than I expected.
but I'd guess it could be improved if written entirely in C, or even better in OpenCL....
I don't think that GPU can beat CPU with AES-NI.
That could be true (I doubt PyCrypto uses AES-NI). Although the oclHashcat guys have done some impressive things, including password generation and AES decryption all on a GPU, which is far more than I've managed (I only do SHA's for Bitcoin Core in GPU, everything else is in CPU).